0% found this document useful (0 votes)
19 views77 pages

MST 001 Block 4 WWW - Khoji.net 18428

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 77

MST-001

-
Foundation in
Indira Gandhi National Open University
School of Sciences Mathematics and
Statistics

Block

4
PRESENTATION OF DATA
UNIT 13
Classification and Tabulation of Data 5
UNIT 14
Diagrammatic Presentation of Data 23
UNIT 15
Graphical Presentation of Data-I 47
UNIT 16
Graphical Presentation of Data-II 61
Curriculum and Course Design Committee
Prof. K.R. Srivathsan Prof. Rahul Roy
Pro-Vice Chancellor Maths and Stat. Unit
IGNOU, New Delhi Indian Statistical Institute, New Delhi

Prof. Parvin Sinclair Dr. Diwakar Shukla


Pro-Vice Chancellor Department of Mathematics and Statistics
IGNOU, New Delhi Dr. Hari Singh Gaur University, Sagar(MP)

Prof. Geeta Kaicker Prof. G.N. Singh


Director, School of Sciences Department of Applied Mathematics
IGNOU, New Delhi I.S.M. Dhanbad

Prof. R.M. Pandey Prof. Rakesh Srivastava


Department of Bio-Statistics Department of Statistics
All India Institute of Medical Sciences M.S. University
New Delhi Vadodara (Gujarat)

Prof. Jagdish Prasad Dr. Gulshan Lal Taneja


Department of Statistics Department of Mathematics
University of Rajasthan, Jaipur M.D. University, Rohtak

Faculty Members, School of Sciences, IGNOU


Statistics Mathematics
Dr. Neha Garg Dr. Deepika
Dr. Nitin Gupta Prof. Poornima Mital
Mr. Rajesh Kaliraman Prof. Sujatha Varma
Dr. Manish Trivedi Dr. S. Venkataraman

Block Preparation Team


Content Writer Language Editor
Dr. Manish Trivedi Dr. Parmod Kumar
Reader in Statistics Assistant Professor
School of Sciences School of Humanities, IGNOU
IGNOU, New Delhi Formatted By
Content Editor Mr. Rajesh Kaliraman
Dr. Gulshan Lal Taneja School of Sciences, IGNOU.
Associate Professor
Secretarial Support
Department of Mathematics
M.D. University, Rohtak Ms. Preeti

Course Coordinator: Mr. Rajesh Kaliraman


Programme Coordinator: Dr. Manish Trivedi

Block Production
Mr. Y. N. Sharma, SO (P), School of Sciences, IGNOU
CRC prepared by Mr. Rajesh Kaliraman, SOS, IGNOU and Ms. Preeti

Acknowledgement: We gratefully acknowledge Prof. Geeta Kaicker, Director, School of Sciences and
Prof. Parvin Sinclair, Director, NCERT for reading the course material and providing their valuable
suggestions to improve the Course.
March, 2012
© Indira Gandhi National Open University, 2012
ISBN – 978-81-266-5973-9

All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other
means, without permission in writing from the Indira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from the
University’s office at Maidan Garhi, New Delhi-110 068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by Director,
School of Sciences.
Printed at: Gita Offset Printers Pvt. Ltd., C-90, Okhla Indl. Area-I, New Delhi-20
BLOCK 4 PRESENTATION OF DATA
In previous block, we have become familiar with origin, development,
definition, importance of statistics and its applications in different areas. We
have also discussed the collection of data and preparation of questionnaires to
collect information. After collecting the information in term of data, we may
like to arrange the collected data in a proper manner because the collected data
may be huge in volume. So, here the need for proper arrangement of data
arises. In statistical terminology, the proper arrangement of data is known as
presentation of data. In this block, we shall try to learn some basic tools to
represent the collected data. There are some frequently used tools available for
representing data and they may be classified in three basic forms statistical
table, diagrams and graphs. This block is devoted to discuss these things. The
flow of the block is maintained by the following four units.
Unit 13: After collection of data next step is classification followed by
tabulation of data. Unit 13 is devoted to discuss what we mean by classification
and tabulation of data.
Unit 14: A pictorial presentation of the tabulated data may be done either with
the helps of different kinds of diagrams or by graphs. This unit discusses about
some commonly used diagrams, while Unit 15 and Unit 16 are devoted to
discuss different types of graphical presentation of the data. That is graphs for
frequency distributions, graphs for time series data, stem-and-leaf displays and
box plots are discussed in Unit 15 and Unit 16.
Notations and Symbols
f : frequency
N  f : total of all frequencies

C. I. : class interval

C : degree Celsius
UNIT 13 CLASSIFICATION OF DATA Classification and Tabulation
of Data

Structure
13.1 Introduction
Objectives
13.2 Classification of Data
13.3 Tabulation of Data
13.4 Summary
13.5 Solutions/Answers

13.1 INTRODUCTION
In Unit 12 of Block-3 of this course, we have discussed some methods of data
collection whether the target population from where the information collected
was small or large. After collection of data, next step is to classify the data in
such a manner that it becomes ready for proper presentation.
The need for proper presentation arises because of the fact that statistical data in
their raw form are almost defy comprehension. When data are presented in
easy-to-read form, it can help the reader to acquire knowledge in much shorter
period of time and also facilitate statistical analysis.
A statistical table is a presentation of numbers in a logical arrangement, with
some brief explanation to show what they are. However, before tabulating data,
it is often necessary to first classify them. So, the concept of classification is
described in Sec. 13.2 of the unit and that of tabulation is discussed in Sec. 13.3.
Objectives
After studying this unit, you should be able to:
 classify a data set according to the nature of the data;
 construct a discrete frequency distribution for a discrete type of data;
 construct a continuous frequency distribution for a continuous type of data;
 classify the collected data according to the class intervals; and
 arrange the data into a suitable form of a table.

13.2 CLASSIFICATION OF DATA


This unit is a combination of classification and presentation in tabular form of
given data. After collection, classification is the next step in processing
collected data. Classification means grouping of related facts into different
classes. Information in one class differs from those of other class with respect
to some characteristics. Sorting particulars according to one basis of
classification and then on another basis is called cross-classification. This
process can be repeated as many times as the possible sources of classification
are there. Classification of data is a function very similar to that of sorting
letters in a post office. Let us explain it further by considering a situation where
university receives applications of candidates for filling up some posts for its
various departments or disciplines. The applications received for the posts in
the university are sorted according to the departments or disciplines to which
they pertain. It is well known that the applications collected in an office are
5
Presentation of Data sorted into different lots, department or discipline wise, i.e. in accordance with
their destinations as Social Sciences, Engineering, Basic Sciences, etc. They
are then put in separate belongings each containing applications with a
common characteristic, viz, having the same discipline. Classification of
statistical data is comparable to the categorisation process. The process of
classification gives distinction to important information gathered, while
dropping unnecessary facts, enables comparison and a statistical treatment of
the material collected. Now the question may arise in your mind that how
collected data is classified. The answer of this question is given under the
heading types of classification as discussed below.
13.2.1 Types of Classification
Broadly, data can be classified under following categories:
(i) Geographical classification
(ii) Chronological classification
(iii) Qualitative classification
(iv) Quantitative classification
Let us discuss these one by one:
(i) Geographical Classification
In geographical classification, data are classified on the basis of location,
region, etc. For example, if we present the data regarding production of
sugarcane or wheat or rice, in view of the four main regions in India, this
would be known as geographical classification as given below in
Table-13.1.
Geographical classification is usually listed in alphabetical order for easy
reference. Items may also be listed by size to emphasis the magnitude of the
areas under consideration such as ranking the states based on population.
Normally, in reference tables, the first approach (i.e. listing in alphabetical
order) is followed.
Table -13.1: Classification of Production of Wheat
Region Production of Wheat (in .000 kg.)
Eastern Region 2873
Northern Region 1646
Southern Region 2059
Western Region 986

(ii) Chronological Classification


Classification of data observed over a period of time is known as chronological
classification. For example, let us consider the profit figures of a company as
shown below for the year from 2001 to 2010.
Table –13.2: Profits of the Company from Year 2001 to 2010
Year Profit (in crores Year Profit (in crores of
of rupees) rupees)
2001 20 2006 12
2002 21 2007 25
2003 10 2008 14
2004 18 2009 19
2005 15 2010 23

6
Time series data are usually listed in chronological order, normally in Classification and Tabulation
ascending order of time, like 2001, 2002,… .When the major emphasis falls on of Data
the most recent events, a reverse time order may be used.
(iii) Quantitative Classification
Quantitative classification refers to the classification of data according to some
characteristics that can be measured numerically such as height, weight,
income, age, sales, etc. For example, the employees of an institute may be
classified according to their pay scales as follows:
Table-13.3: Quantitative Classification of 840 Employees According to their Pay Scales
Scale of Pay Number of Employees
9300 - 34800 467
15600 - 39100 215
37400 - 67000 158
Total 840

The quantitative classification is a combination of two elements, namely


“Variable, i.e. the pay scale” and “the frequency (the number of employees in
each class)” in the above example. There are 467 employees getting salary
according to the pay scale 9300-34800, 215 employees are getting salary
according to the pay scale 15600-39100 and so on. The quantitative
classification gives birth to a frequency distribution which is discussed in
subsection 13.2.2.
(iv) Qualitative Classification
In qualitative classification, data are classified on the basis of some attributes
or qualitative characteristics such as sex, colour of hair, literacy, religion, etc.
You should note that in this type of classification the attribute under study
cannot be measured quantitatively. One can only count it according to its
presence or absence among the individuals of the population under study. For
example, in case of colour blindness, we may find out as how many persons are
colour blind in a given population. It is not possible to measure the degree of
colour blindness in each case. Thus, when only one attribute is studied, two
classes are formed – one for possessing the attribute and the other for not
possessing it. This type of classification is known as simple classification. For
example, the population under study may be divided into two categories based
on the characteristic ‘Colour blindness’ as follows:
Population

Persons with Colour Blindness Persons without Colour Blindness

In a similar manner, we may classify population of a colony on the basis of


education qualification, employment, sex, etc. This type of classification where
two by two classes are formed is called two fold or dichotomous classification.
If, instead of forming only two classes, we further divide the data on the basis
of some other attributes within those attributes is known as manifold
classification. For example, we may first divide the population into ‘men’ and
‘women’ on the basis of the attribute ‘sex’. Each of these classes may be
further subdivided into literate and illiterate on the basis of the attribute
7
Presentation of Data ‘literacy’. Further classification can be done on the basis of some other
attribute say employment. Such type of classification is known as manifold
classification and is shown as follows:

Population

Men Women

Literate Illiterate Literate Illiterate

Employed Unemployed Employed Unemployed


Employed Unemployed
Employed Unemployed

Now, you can try the following exercises:


E1) The amount of production of wheat (in ,000 kg.) are 230, 376, 136, 583
for the cities Bhopal, Agra, Mumbai and Chandigarh respectively.
Classify the data.
E2) If a company is manufacturing a product from 2001 to 2010 and earning
the profits (in crores of rupees) as 10, 15, 13, 17, 12, 16, 17, 21, 20, 18 for
the last 10 years respectively. Classify the given data.

13.2.2 Frequency Distribution


When observations, whether they are discrete or continuous, available on a
single characteristic of a large number of individuals, it becomes necessary to
condense the data as far as possible without loosing any information of interest.
Let us consider the ages of 30 students selected at random from among those
studying in a certain class.
20, 22, 25, 22, 21, 22, 25, 24, 23, 22, 21, 20, 21, 22, 23, 25, 23, 24, 22, 24, 21,
20, 23, 21, 22, 21, 20, 21, 22, 25.
This presentation of the data is not considered as good since for large number
of observations it is not easy to handle the data in this form. A better way to
express the figures is shown in Table 13.4 below:
Table–13.4: Frequency Distribution of 30 Students According to their Age
Age of students Tally Mark Frequency
20 |||| 04
21 |||| || 07
22 |||| ||| 08
23 |||| 04
24 ||| 03
25 |||| 04
Total 30

8
A bar (|) called tally mark is put against the number when it occurs. After Classification and Tabulation
putting this mark four times against the value, a cross tally is put on these 4 of Data
tallies for the fifth mark as shown in the above table. From the sixth mark
onwards, we start afresh in the similar manner. This technique facilitates easy
counting of the tally marks at the end. The presentation of the data as given in
Table 13.4 is known as frequency distribution.
A frequency distribution refers to the data which are classified on the basis of
some variables that can be measured such as wages, age of children, etc. A
variable refers to the characteristic that varies in magnitude in a frequency
distribution. It may be either discrete or continuous. A discrete variable is that
which generally takes integer values. For example, the number of students, the
number of books, etc. A continuous variable can take integer or fractional
values within the range of possibilities, such as the height or weight of
individuals. Generally speaking, continuous data are obtained through
measurements while discrete data are derived by counting. A series described
by a continuous variable is called continuous series. Similarly, series
represented by a discrete variable is called discrete series.
According to the nature of the variable, the frequency distribution may be of
two types, i.e. discrete frequency distribution and continuous frequency
distribution. Let us discuss them one by one.

Discrete Frequency Distribution

A frequency distribution in which the information is distributed in different


classes on the basis of a discrete variable is known as discrete frequency
distribution. For example, frequency distribution of number of children in 20
families is discrete frequency distribution as shown in Table–13.5.
Table –13.5: Frequency Distribution of the Number of Children in 20 Families

No. of Tally Frequency


children Mark
0 ||| 3
1 |||| 4
2 ||||| 6
3 |||| 4
4 ||| 3
Total 20

Continuous Frequency Distribution

A distribution in which the information is distributed in different classes on the


basis of a continuous variable is known as continuous frequency distribution.
There may be some variables which have integer values as well as fractional
values. Frequency distribution of such variables is called continuous frequency
distribution. An example of a continuous frequency distribution is given below
in Table-13.6.

9
Presentation of Data Table 13.6: Frequency Distribution of Heights of 50 Persons
Heights (cm) Tally Mark Frequency
120 -130 ||| 3
130 -140 |||| 5
140- 150 |||| |||| 10
150 -160 |||| |||| |||| 14
160 -170 |||| |||| || 12
170-180 |||| 4
180-190 || 2
Total 50
After discussing the discrete and continuous frequency distributions let us
discuss the Relative and Cumulative frequency distributions which are of the
similar importance as analysis point of view of data is considered.
Relative Frequency Distribution
A relative frequency corresponding to a class is the ratio of the frequency of
that class to the total frequency. The corresponding frequency distribution is
called relative frequency distribution. If we multiply each relative frequency by
100, we get the percentage frequency corresponding to that class and the
corresponding frequency distribution is called “Percentage frequency
distribution”. Let us take an example in which both relative and percentage
frequency distributions are prepared.
Example 1: A frequency distribution of marks of 50 students in a subject is as
given below:
Class (Marks): 0-10 10-20 20-30 30-40 40-50
Frequency: 6 10 14 18 2
Prepare relative and percentage frequency distributions.
Solution: The relative and percentage frequency distributions can be formed as
given in the following table:
Class (Marks) Frequency (f) Relative Percentage Frequency
X frequency (f/N) (f/N)  100
0-10 6 6/50 = 0.12 0.12  100 = 12 %
10-20 10 10/50 = 0.20 0.20  100 = 20 %
20-30 14 14/50 = 0.28 0.28  100 = 28 %
30-40 18 18/50 = 0.36 0.36  100 = 36 %
40-50 2 2/50 = 0.04 0.04  100 = 4 %
Total 1.00 100
 f  N  50
Cumulative Frequency Distribution
The cumulative frequency of a class is the total of all the frequencies up to and
including that class. A cumulative frequency distribution is a frequency
distribution which shows the observations ‘less than’ or ‘more than’ a specific
value of the variable.
The number of observations less than the upper class limit of a given class is
called the less than cumulative frequency and the corresponding cumulative
frequency distribution is called less than cumulative frequency distribution.

10
Similarly, the number of observations corresponding to the value of more than Classification and Tabulation
the lower class limit of a given class is called more than cumulative frequency of Data
and the corresponding cumulative frequency distribution is called ‘more than’
cumulative frequency distribution. Following is an example, wherein ‘less
than’ and ‘more than’ cumulative frequency distributions have been obtained.
Example 2: For the following frequency distribution of marks of 50 students in
a subject, form both types of cumulative frequency distributions.
Class (Marks) 0-10 10-20 20-30 30-40 40-50
No. of Students 7 11 15 12 5

Solution: Cumulative frequency distributions are formed as given in the


following table:
Given Frequency Less Than Cumulative More Than Cumulative
Distribution Frequency Distribution Frequency Distribution
Classes No. of Marks No. of Marks No of
Students Less than students More than students
0-10 07 10 07 0 50
10-20 11 20 18 10 43
20-30 15 30 33 20 32
30-40 12 40 45 30 17
40-50 05 50 50 40 05
Total 50

Now, you can try the following exercises.


E3) Construct a discrete frequency distribution for 25 students studying in a
class having the following ages (in years):
20, 21, 19, 18, 20, 20, 19, 18, 21, 19, 22, 21, 18, 19, 21, 22, 19, 18, 20,
19, 20, 22, 20, 21, 20.
E4) Construct a continuous frequency distribution for the 50 students
studying in a class having the following heights (in cm): 146, 156, 152,
167, 178, 180, 172, 162, 148, 153, 161, 173, 163, 174, 147, 179, 148,
151, 168, 172, 165, 173, 172, 180, 175, 145, 153, 154, 162, 164, 170,
172, 160, 161, 158, 152, 163, 165, 170, 168, 158, 149, 155, 160, 150,
149, 167, 176, 169, 159.
After discussing the frequency distributions we now discuss how the concept of
frequency distribution can be used to classify the data according to the class
intervals in the next subsection.
13.2.3 Classification According to Class Intervals
To make data understandable, data are divided into number of homogeneous
groups or sub groups. In classification, according to class intervals, the
observations are arranged systematically into a number of groups called
classes. Such classification is most popular in practice. But before this
discussion we have to define some terms which will be used in the above
classification.
(i) Class Limits
The class limits are the lowest and the highest values of a class. For example,
let us take the class 10-20. The lowest value of this class is 10 and the highest
11
Presentation of Data 20. The two boundaries of a class are known as the lower limit and upper limit
of the class. The lower limit of a class is the value below which there can not
be any value in that class. The upper class limit of a class is the value above
which no value can belong to that class.
(ii) Class Intervals
The class interval of a class is the difference between the upper class limit and
the lower class limit. For example, in the class 10-20 the class interval is 10
(i.e. 20 minus 10). This is valid in the case of exclusive method discussed in
this subsection later on. If the inclusive frequency distribution (discussed later
on in this subsection) is given then first it is converted to exclusive form and
then class interval is calculated. The size of the class interval is determined by
number of classes and the total range of data.
(iii) Range of Data
The range of data may be defined as the difference between the lower class
limit of the first class interval and the upper class limit of the last class interval.
(iv) Class Frequency
The number of observations corresponding to the particular class is known as
the frequency of that class or the class frequency. In the given frequency
distribution (Table -13.7), the frequency of the class 10-20 is 12 which implies
that there are 12 persons having ages between 10-20. If we add together the
frequencies of all individual classes, we obtain the total frequency.
Table-13.7: Frequency Distribution of 50 Persons having Ages between 0-50 Years.
Classes Frequencies
0-10 08
10-20 12
20-30 15
30-40 10
40-50 05
Total 50

(v) Class Mid Value


It is the value lying half way between the lower and upper class limits of class-
interval, mid-point or mid value of a class is defined as follows:
Upper class limit  Lower class limit
Mid Value of a Class 
2
For the purpose of further calculations in statistical analysis, mid value of each
class is taken to represent that class.
Now we are in position to discuss the two methods of classification according
to class intervals, namely “Exclusive Method” and “Inclusive Method”. Let
us discuss these two methods one by one:
Exclusive Method
Under this method, a class interval is such that each upper class limit is
excluded from the class interval. Here in this method, class intervals are so
fixed that the upper limit of one class is the lower limit of the next class. In the

12
following example there are 24 students who have secured the marks between Classification and Tabulation
0 and 50. A student who secured 20 marks would be included in class 20-30, of Data
not in 10–20. This method is widely followed in practice.
Example 3: 24 students appeared in an entrance test where all questions are
objective type with 25% –ve marking. The marks obtained out of 50 maximum
marks are as follows:
17, 16, 7, 30, 21, 42, 44, 36, 22, 22, 25, 31, 31, 34, 30, 36, 35, 45, 25, 15,
20, 42, 40, 30
Prepare a frequency distribution by using exclusive method.
Solution: Frequency distribution of marks obtained by above 24 students is
given below in table 13.8 using exclusive method as follows:
Table 13.8: Frequency Distribution of 24 Students by Exclusive Method
Classes Tally No. of Students
bar
0-10 | 1
10-20 ||| 3
20-30 |||| | 6
30-40 |||| |||| 9
40-50 |||| 5
Total 24

Inclusive Method
Under the inclusive method of classification both lower class limit as well as
the upper limit of a class is included in that class itself. Following frequency
distribution is formed using inclusive method for the data of Example 3 given
above.
Table 13.9: Frequency Distribution of 24 Students by Inclusive Method
Class Tally bar No. of
Students
0-9 | 1
10-19 ||| 3
20-29 |||| | 6
30-39 |||| |||| 9
40-49 |||| 5
Total 24

That means if data are classified in such a way that the lower as well as the
upper class limits are included in the same class interval, it is called inclusive
class interval.
For converting data from inclusive form to exclusive form, first of all we find
the half of the difference of lower limit of that class and upper limit of the
preceding class. This value is then subtracted from lower limit of each class
and added to the upper limit of each class. In the above example, this can be
easily understood as (10–9)/2 = 0.5. So, the class intervals are as – 0.5- 9.5,
9.5-19.5, … , 39.5-49.5. If all the observations of data are positive then the
lower limit of first class can be taken 0. Therefore, in this case the class
intervals are as 0-9.5, 9.5-19.5, …, 39.5-49.5.
13
Presentation of Data Remark
(i) Lower limit of a class interval is always included in the class in both the
method discussed above.
(ii) In exclusive method upper limit of a class is not included in the class.
That is why the name exclusive.
(iii) In inclusive method upper limit of a class is also included in the class.
That is why the name inclusive.

13.2.4 Principles of Classification


It is difficult to formulate any hard and fast rule for classifying the data.
However, the following general considerations may be considered for ensuring
meaningful classification of data:
(1) The whole data should be preferably divided into number of classes
between 5 and 15. However, there is no rigidity about it. The classes can
be more than 15 depending upon the total number of observations and
variations between them and the details required for given data, but they
should not be less than 5 because in that case the classification may not
reveal the essential characteristics.
To determine the approximate number of classes (K) the following
formula is suggested by “Struges”:
K = 1 + 3.322 Log N, where K = the approximate number of classes
N = total number of observations
Log = the natural logarithm
However, the appropriate number of classes to be taken for a given data
depends upon the personal judgment and other considerations such as
range of data, total number of observations, etc.
(2) One should avoid odd values of class intervals as far as possible, e.g. 3,
7, 11, 26, 39, etc. One should prefer 5 or 10 or multiple of 5 or 10 as
class intervals such as 5, 10, 20, 25, 100, etc, because the human mind is
accustomed more to think in terms of certain multiples of 5 or 10.
(3) The lower class limit of the first class of a frequency distribution should
either be zero or 5 or multiple of five. For example if the lowest value of
the data is 26 and we have taken a class interval of 5, then the first class
should be 25-30, instead of 26-31. Similarly if the lowest value of the
series is 43 and the class interval is 5 then the first class should be 40-45
inspite of 43-48.
(4) To maintain continuity and to get correct class interval, we should adopt
exclusive method of classification. However, where ‘inclusive’ method
has been adopted it is necessary to make an adjustment to determine the
correct class interval and to maintain continuity.
How the adjustment is made when data are given by inclusive method
explained in the previous sub Sec. 13.2.4. The same adjustment has been
done in the frequency distribution given in Table 13.9, which is given in
Table 13.10 as shown on the next page:

14
Table 13.10: Frequency Distribution of 24 Persons by Inclusive Method Classification and Tabulation
Classes No. of of Data
Students
– 0.5-9.5 01
9.5-19.5 03
19.5-29.5 06
29.5-39.5 09
39.5-49.5 05
Total 24

(5) The intervals of all the classes should be of the same size, because if the
class intervals are not of the same width, it is difficult to make
meaningful comparison between classes. Sometimes the data may require
the inclusion of so many class intervals that the frequency distribution
will become large. Then the classification may be done as follows:
below 10
10-20
20-30
30-40
above 40
These classes are called open end classes and the distribution is known
as open end frequency distribution.
It may be noted that the frequency distributions, like other types of data
presentation, are always constructed to serve some specific purpose. The
technical requirements outlined above must be supplemented by sound
subjective judgments if proper frequency distributions are to be formed.
After learning so much about classification of data, you have got/realised the
importance of classification. So before move to next section, let us just
highlight/outline some of the main points related to the importance of
classification:
 It is preliminary for further statistical analysis,
 It facilitates comparison and make conclusion easy,
 It facilitates tabulation.
Now, you can try the following exercises.
E5) The marks of 30 students in statistics are given below:
10, 12, 25, 32, 27, 32, 38, 43, 39, 55, 29, 38, 57, 08, 06, 13, 27, 25, 29, 53,
55, 45, 35, 48, 47, 59, 15, 19, 48, 55
Classify the above data by taking a suitable class interval.
E6) Present the following data of the profits (in crores of Rs.) of the 60
companies in the years 2009-10:
41, 17, 83, 63, 55, 92, 60, 58, 70, 06, 67, 82, 33, 44, 57, 49, 34, 73, 54, 63,
36, 52, 32, 75, 60, 33, 09, 79, 28, 30, 42, 93, 43, 80, 03, 32, 57, 67, 84, 64,
63, 11, 35, 28, 10, 23, 08, 41, 60, 32, 72, 53, 92, 88, 62, 55, 60, 33, 40, 57
Classify data by inclusive method.
E7) Use the data given in the E6 to present the same using principle of adding
and subtracting the correction factor.
15
Presentation of Data 13.3 TABULATION OF DATA
One of the simplest and most revealing devices for summarising and presenting
data in a meaningful arrangement is statistical table. We can also define a
statistical table as the logical listing of quantitative data in columns and rows of
numbers with sufficient explanatory statements. The statements may be given
in the form of titles, headings and notes to make clear the full meaning of data
and their origin.
In other words, a table is a systematic arrangement of statistical data in
columns and rows. Rows are horizontal arrangements, whereas columns are
vertical ones. A table can solve the purpose of the presentation and facilitate
comparison. The simplification results from the clear-cut and systematic
arrangement, which enables the reader to quickly locate the desired
information. Comparison is facilitated by brining related items of information
close together.
13.3.1 Components of a Table
The various components of a table may vary case to case depending upon the
given data. But a good table must contain at least the following components:
1. Table Number
2. Table Heading
3. Caption
4. Stub
5. Body of Table
6. Head Note
7. Foot Note
Let us throw some light on these components one by one:
1. Table Number
A statistical table should be numbered. There are different ways with regard to
the place where table number is to be given. The table number may be shown
either in the centre at the top above the title or in the left hand side of the table
at the top. When there are many columns, it is desirable to number each
column so that easy reference to it is possible.
2. Table Heading
A good table should have a suitable heading. The heading is a brief description
of the contents of the table. It should be placed above the table. It should
answer the following questions:
(a) What categories of statistical data are shown?
(b) Where the data occurred?
(c) When the data occurred?
In other words the heading of the table should be clear, brief and self-
explanatory, but some times long title may have to be used for the sake of
clarity. The title should be so worded that it permits one and only one
interpretation.
3. Caption
Caption refers to the column heading, and explains what information column
presents. It may consist of one or more column headings, i.e. under a column
16
heading there may be two or more sub headings. The caption should be clearly Classification and Tabulation
defined and placed at the middle of the column. If the different columns are of Data
expressed in different units, the unit should be specified along with the
captions.
4. Stub
The stubs are row headings. They are placed at the extreme left of the table and
perform the same function for the horizontal rows in the table as the captoins
do for the vertical columns.
5. Body
The body of the table is the central part of table that contains the numerical
information presented in table. This is the most vital part of the table.
6. Head Note
Head note is a brief explanatory statement applying to all or a major part of the
material presented in the table and is placed below the title entered and
enclosed in brackets. It is used to explain certain points relating to the whole
table that have not been included in the title nor in the captions or stubs. For
example, the unit of measurement is frequently written as the head note such as
“in thousands” or “million tons” or “in crores”, etc.
7. Foot Note
Anything in a table which the reader supposed to find difficult to understand
should be explained in footnotes. Footnotes may be placed directly below the
body of the table. The footnotes are generally used for the following purposes:
(a) Any special circumstances affecting the data, for example, strike, fire,
etc.
(b) To clarify any thing in the table.
(c) To give the source in case of the secondary data. If any information in the
table obtained from some journal, its name, date of publication, page
number, table number, etc. should be mentioned so that if the user wishes
to check the data from the original source, he could know where to look
for the information.
After discussing the parts of a table, let us discuss different kinds of tables,
through which we can represent or arrange the different types of informations.
13.3.2 Types of Tables
Tables may broadly be classified into following two categories.
1. Simple and Complex Tables
2. General Purpose and Special Purpose Tables
1. Simple and Complex Tables
The simple and complex tables can be differentiated on the basis of number of
characteristics presented and studied. If the data based on one characteristic is
presented, the table is known as simple table. The simple table is also known as
one way table. On the other hand, in a complex table, two or more
characteristics are presented. The complex tables are frequently used in
practice because they facilitate to incorporate full information and a proper
consideration of all related facts. If the data are tabulated on the basis of only
two characteristics then the table is known as two way table. If three
17
Presentation of Data characteristics are arranged in a table then the table is known as treble table.
When four or more characteristics are simultaneously presented it is known as
manifold tabulation.
The following table presenting the distribution of marks obtained by 100
students in a test is an illustration of a simple table:
Table-13.11: Distribution of Marks Obtained by 100 Students in Statistics
Marks No. of Students
Below 10 5
10-20 8
20-30 12
30-40 10
40-50 15
50-60 18
60-70 17
70-80 13
Above 80 02
Total 100

Two Way Table


Two way table shows two characteristics and is formed when either the stub or
the caption or both are divided into two categories. In the following example
the nature of such a table is given and is an illustration of two-way table (a
complex table):
Table -13.12: Number of Persons Living in a Colony According to Age and Sex.

Age Persons Living in the Colony Total


Males Females
Below 15 12 6 18
15-25 20 12 32
25-35 42 27 69
35-45 25 18 43
45-55 10 8 18
55-65 8 5 13
65 and Above 5 2 07
Total 122 78 200

Higher Order Table


When three or more characteristics are represented in the table then such a
table is called higher order table. The need for such a table arises when we are
interested in presenting three or more characteristics simultaneously.
It should be remembered that as the number of characteristics increases, the
table becomes more and more conducing. It is advised normally not more than
four characteristics should be represented in the same table. When more than
four characteristics are to be represented we should form more than one table
depicting relationship between different attributes.

18
2. General Purpose and Special Purpose Tables Classification and Tabulation
of Data
General purpose tables, also known as reference tables or repository tables, and
provide the information for general use or reference. They usually contain
detailed information and are not used for specific discussion. In other words,
these tables serve as a repository of information and are arranged for easy
reference such as the tables published by government agencies, the tables
contained in the statistical abstract of the Indian Union, tables in the census
reports, etc.
The general tables tell facts which are not for particular discussion. If general
tables are used by a researcher, they are usually placed in the form of appendix
at the end of the report for easy reference.

Special purpose tables, also known as summary tables or analytical tables,


provide information for particular discussion. These tables are also called
derivative tables since they are often derived from general tables. A special
purpose table should be designed in such a way that a reader may easily refer to
the table for comparison, analysis or emphasis concerning the specific
discussion.
Now, you can try the following exercises.
E8) In a sample survey study about the drinking habits in two cities, it is
observed that, in city X 57% are male, 22% are drinkers, and 14% are
male drinkers, whereas in city Y 52% are male, 28% are drinkers and
21% are male drinkers. Tabulate the above information.
E9) Present the following information in a suitable tabular form:
In 2009 out of a total 2000 employees in a company 1550 were members
of a trade union. The number of women employees was 250, out of
which 200 did not belonging to any trade union. While in 2010 the
number of union employees was 1725 out of which 1600 were men. The
number of none union employees was 380 among which 155 were
women.

13.4 SUMMARY
In this unit we have covered the concepts of classification and tabulation of
data. That is we have discussed:

1) Classification of a data set according to the nature of data.


2) The methods of construction of a frequency distribution.
3) The methods of construction of discrete and continuous frequency
distributions.
4) Fundamentals of classification of data according to the class intervals.
5) The methods of construction of relative and cumulative frequency
distributions.
6) Parts of a table.
7) Types of the tables and presenting data into a suitable form of a table.

19
Presentation of Data 13.5 SOLUTIONS/ANSWERS
E1) The classification of the data for the production of wheat according to the
given cities can be done in the following way:
Table 13.13: Geographical Classification of the Production of Wheat
Region Production of Wheat
( in .000 kg.)
Agra 376
Bhopal 230
Chandigarh 583
Mumbai 136

E2) Classification of the profits of a company from 2001 to 2010 can be done
in the following way:
Table 13.14: Chronological Classification of Profits from 2001 to 2010
Year Profits Year Profits
(in crores of (in crores of
rupees) rupees)
2001 10 2006 16
2002 15 2007 17
2003 13 2008 21
2004 17 2009 20
2005 12 2010 18

E3) Discrete frequency distribution for the given information can be


constructed in the following way:
Table 13.15: Discrete Frequency Distribution of 25 Students According to their Age
Age of the Tally Mark No. of the
students students
18 |||| 04
19 |||| | 06
20 |||| || 07
21 |||| 05
22 ||| 03
Total 25

E4) The continuous frequency distribution for the given information can be
constructed in the following way:
Table 13.16: Continuous Frequency Distribution of 50 Students According to their
Heights

Heights (cm) Tally Mark Frequency


145-150 |||| || 07
150-155 |||| || 07
155-160 |||| 05
160-165 |||| |||| 09
165-170 |||| || 07
170-175 |||| |||| 09
175-180 |||| 04
180-185 || 02
Total 50

20
E5) Let us determine the suitable class interval with the help of the following Classification and Tabulation
formula: of Data
Range
i
1  3.322 Log N
Range = 59  06 = 53, N = 30
53 53
i   8.97 9
1  3.322 Log 30 1  4.91
Since values like 3, 7, 9 etc., should be avoided and therefore, we will take
10 as the class interval and hence let us take the first class as 5-15 and thus
the following table is formed:
Table 13.17: Continuous Frequency Distribution of 30 Students According to their
Heights
Heights Tally Mark Frequency
(cm)
05-15 |||| 5
15-25 || 2
25-35 |||| ||| 8
35-45 |||| 5
45-55 |||| 5
55-65 |||| 5
Total 30
E6) As the least value is 3 and the highest value is 93, so using
Range 93  3
i  13.03 13
1  3.322 Log N 1  3.322 Log 60
since, values like 3, 7, 9, 11, 13 etc., should be avoided and therefore, we
will take 14 as class interval and hence let us take the first class as 0-14
and thus the following table is formed.
Table 13.18: Continuous Frequency Distribution of 60 Students According to their
Heights

Heights (cm) Tally Mark Frequency


0-14 |||| | 06
15-29 |||| 04
30-44 |||| |||| |||| | 16
45-59 |||| |||| 10
60-74 |||| |||| |||| 14
75-89 |||| || 07
90-104 ||| 03
Total 60

E7) Table 13.19 given on next page illustrates the way of classification of
data according to the exclusive method and principle of correction factor
in classification.

21
Presentation of Data Table 13.19: Continuous Frequency Distribution of 60 Students
According to their Heights
Heights (cm) Tally Mark Frequency

 0.5-14.5 |||| | 06
14.5-29.5 |||| 04
29.5-44.5 |||| |||| |||| | 16
44.5-59.5 |||| |||| 10
59.5-74.5 |||| |||| |||| 14
74.5- 04.5 |||| || 07
89.5-94.5 ||| 03
Total 60

E8) The following table is the representation of the data for the given
information’s regarding the drinkers in city X and city Y.
Table13.20: Presentation of Data regarding the Drinkers in City X and City Y in
the form of Two Way Table

Attributes City X Total City Y


Males Females Males Females Total
Drinkers 14 8 22 21 7 28
Non-drinkers 43 35 78 31 41 72
Total 57 43 100 52 48 100

E9) The following table is showing the trade union membership.


Table 13.21: Presentation of Data regarding the Trade Union
Membership in the Year 2009 and 2010 in the form of Two Way Table
Category 2009 2010 Total
Trade None Total Trade None
Union Union Union Union
Members Members Members Members
Men 1500 250 1750 1600 225 1825
Women 50 200 250 125 155 280
Total 1550 450 2000 1725 380 2105

22
UNIT 14 DIAGRAMMATIC PRESENTATION Diagrammatic Presentation
of Data
OF DATA
Structure
14.1 Introduction
Objectives
14.2 Diagrammatic Presentation
14.3 One Dimensional or Bar Diagrams
14.4 Two Dimensional Diagrams
14.5 Pie Diagrams
14.6 Pictogram
14.7 Cartogram
14.8 Choice of a Suitable Diagram
14.9 Summary
14.10 Solutions/Answers

14.1 INTRODUCTION
In Unit 13, we have discussed about the classification and tabulation of data.
Though these methods are very helpful to make easy and systematic
presentation of the data, even then people are least interested in tables. A group
of large number of observations always makes misperception to the reader and
he/she may understand it wrongly. If data are presented in the form of
diagrams, it attracts the reader and he/she tries to understand it. Diagrammatic
presentation helps in quick understanding of data. Confirmation of this can be
found in the financial pages of news papers, journals, advertisement, etc. There
are many methods of representing the numerical figures through diagrams but
sometimes, it is very difficult to decide that which is the best diagram in a
specific situation?
In this unit we will discuss one-dimensional, two dimensional and pie
diagrams. Pictogram and Cartogram have been also discussed in this unit. Unit
ends with a note on choice of a suitable diagram in a given situation.
Objectives
After studying this unit, you should be able to:
 become familiar with the diagrammatic presentation of data;
 draw suitable bar diagrams for given data;
 draw rectangle and square diagrams for the given data;
 draw pie diagram;
 draw pictograms and cartograms; and
 select an appropriate diagram to represent data.

14.2 DIAGRAMATIC PRESENTATION OF DATA


Before we discuss different types of diagrams, let us first see what are the
significance and general rules for constructing the diagrams?
23
Presentation of Data 14.2.1 Significance of Diagrammatic Presentation of Data
Significance of the diagrams can be explained by the following points.
(i) Easy Understanding
A large number of observations become, easy to understand through
diagrams. As the number of observations increases, their analysis tends to be
more tedious, but through diagrams the presented data can be understood
easily. It is saying also that a picture have explanation power of worth more
than 10000 words.
(ii) Attractive Look
Diagrams look attractive to the eyes. The numbers are boring whereas the
diagrams give pleasure to the eyes. Diagrams are more attractive and
impressive than the numbers. That is why, the reader gives more attention to
the diagrams rather than the numbers, while reading a newspaper or magazine.
Therefore, the use of diagrams is increasing very fast in exhibitions, fairs,
newspapers and common festivals day by day.
(iii) Greater Memorising Effect
Diagrams are long lasting than numbers. Numbers may not be remembered
easily but diagrams have greater memorising effect, as the impressions created
by them remains in mind for long time.
(iv) Comparison of Data
Through the diagrams, one can easily compare the data related to different
areas and time. It is difficult to read and compare the numbers whereas
diagrams can be compared easily by viewing the presented informations.
14.2.2 Components of Diagrams
The following components should be considered carefully while constructing
diagrams:
(i) Title of the Diagram
Every diagram should have a suitable title. The title of the diagram should
convey the main idea in as least words as possible, but it should not omit the
necessary information. The title of the diagram may be preferably placed at
the top of the diagram.
(ii) Size of the Diagram
Diagram should have a proper size. A proper proportion between the height
and width of the diagram should be maintained. If either height or width is too
short or too long in proportion, the diagram would give an odd impression.
There are no fixed rules about the dimensions, but we may follow an important
suggestion given by Lutz in the book entitled “Graphic Presentation” that the
proportion between height and width should be 1:1.414. In this proportion
diagram looks attractive.
(iii) Scale of the Diagram
Before constructing diagram, a proper scale should be identified. No hard and
fast rules are to be followed about the scale. The concern data and the required
size of diagram are the guiding factors. The diagram should neither become

24
too big nor too small. Similar scale is necessary for comparison of diagrams. Diagrammatic Presentation
Scale should be mentioned clearly at the top of the diagram or below it. of Data

(iv) Footnotes
To clarify certain points about the diagram, footnotes are to be used.
Footnotes may be given at the bottom of the diagram.
(v) Index of Diagram
An Index should be given to illustrate different types of lines or different
types of shades or colours, so that the reader can easily make out the meaning
of the diagram.
(vi) Neat and Clean Diagram
A good diagram should be absolutely neat and clean. Too many information
should not be given in one diagram otherwise reader may get confused.
(vii) Simple Diagram
A good diagram should be as simple as possible so that the reader can
understand its meaning clearly, otherwise the complexity can omit its main
theme.
In previous two subsections we have explained the significance and general
rules for construction of diagrams. In next subsection we will just list the types
of the diagrams. Then in subsequent sections we will discuss each type of
diagrams in detail.
14.2.3 Types of Diagrams
In practice, various types of diagrams are in use and new ones are constantly
being added. For the sake of application and simplicity several types of
diagrams are categorised under the following heads:
(i) One Dimensional Diagrams or Bar Diagrams
(ii) Two Dimensional Diagrams
(iii) Pie Diagrams
(iv) Pictogram
(v) Cartogram

14.3 ONE DIMENSIONAL OR BAR DIAGRAMS


Bar diagrams are the most commonly used diagrams. Shape of a bar is like a
rectangle filled with some colour (see Example 1). They are called one
dimensional diagrams because only length of the bar matters and not the width.
That is, width of each bar remains same in a diagram, but it may vary diagram
to diagram depending on the space available and number of bars to be
presented. For large number of observations lines may be drawn instead of
bars to save space.
Following are the special merits of bar diagrams or one dimensional diagrams:
(i) They are easily understood even by those who are not chart minded.
(ii) They are the simplest and easiest in comparing two or more diagrams.
(iii) They are the only form that can be used effectively for comparing the
large number of observations.

25
Presentation of Data After looking the merits of bar diagrams, you will be keen to know how
bar diagrams are constructed and how many types of bar diagrams are
generally used. Coming two subsections will address the above two
points/questions.
14.3.1 Types of Bar Diagrams
The following are the different types of bar diagrams:
(i) Simple Bar Diagram
(ii) Subdivided Bar Diagram
(iii) Multiple Bar Diagram
(iv) Percentage Bar Diagram
(v) Deviation Bar Diagram
(vi) Broken Bar Diagram
Let us discuss these types of bar diagrams one by one.
(i) Simple Bar Diagram
If someone has to represent the data based on one variable, then the simple bar
diagram can be used. For example, the figures of productions, profits, sales,
etc. for various years may be represented by the help of simple bar diagrams.
From simple bar diagrams reader can easily see the variation in the
characteristic under study with respect to time or some other given factor,
because width of each bar is same and only lengths of the bars vary. In our
representation we will take length of bars along vertical axis and other given
factor along horizontal axis. They are very popular in practice. For example,
while presenting the total turnover of a company for last five decades, one can
only depicts the total turnover amount in the simple bar diagrams. Let us
construct a simple bar diagram in the following example.
Example 1: The profit (in Rs crore) of a company from 1990-91 to 1999-
2000 are given below:
Year Profit (in Rs crore) Year Profit (in Rs crore)
1990-91 35.6 1995-96 87.2
1991-92 46.7 1996-97 113.1
1992-93 39.8 1997-98 123.6
1993-94 68.2 1998-99 119.7
1994-95 93.5 99-2000 130.8

Represent this data by a simple bar diagram.


Solution: The simple bar diagram of the above data is given below:

26
(ii) Subdivided Bar Diagram Diagrammatic Presentation
of Data
If various components of a variable are to be represented in a single diagram
then subdivided bar diagrams are made in this situation. For example, a
number of members of teaching staff in various departments of an institute
may be represented by a subdivided bar diagram. Each bar is divided into the
number of components in this diagram. First of all the cumulative or total
amount is calculated from the amounts of components. Then bar is divided
with respect to the magnitude of the components. The length of the bar is equal
to the total of the amounts of the components.
A bar is represented in the order of magnitude from the largest component at
the base of the bar to the smallest at the end of the bar, but the order of various
components in each bar is kept in the same order. Different shades or colours
are used to distinguish between different components. To explain such
differences, the index should be used in the bar diagram.
Subdivided bar diagrams can be represented vertically or horizontally. If the
number of components are more than 10 or 12, the subdivided bar diagrams
are not used because in that case, the diagram would be over loaded with
information and cannot easily be compared and understood. Let us see how
subdivided bar diagram is constructed with the help of the following example:
Example 2: Represent the following data by subdivided bar diagram:
Category Cost per chair (in Rs) year wise
1990 1995 2000
Cost of Raw Material 15 20 30
Labour Cost 15 18 25
Polish 5 6 15
Delivery 5 6 10
Total 40 50 80
Solution: First of all we calculate the cumulative cost on the basis of the given
amounts:
Category 1990 1995 2000
Cost Cumulative Cost Cumulative Cost Cumulative
(in Rs) Cost (in Rs) (in Rs) Cost (in Rs) (in Rs) Cost (in Rs)
Cost of RM 15 15 20 20 30 30
L Cost 15 30 18 38 25 55
Polish cost 5 35 6 44 15 70
Delivery 5 40 6 50 10 80
Total 40 50 80
On the basis of above table required subdivided bar diagram is given below:

27
Presentation of Data (iii) Multiple Bar Diagram
In multiple bar diagram, we construct two or more than two bars together. The
multiple bars are constructed for either the different components of the total or
for the magnitudes of the variables. All the bars of one group of data are made
together so that the comparison of the bars of different groups can be done
properly. The height of the bars will be magnitude of the component to be
presented as similar as we do in simple bar diagram. In this diagram the space
between the vertical axis and the first bar of the first group of bars is left but no
space is left between the bars of the same group. There must also be left the
space between the bars of the two different groups of bars.
In multiple bar diagrams two or more groups of interrelated data are presented.
The technique of drawing such type of diagrams is the same as that of simple
bar diagram. The only difference is that since more than one components are
represented in each group, so different shades, colours, dots or crossing are
used to distinguish between the bars of the same group, and same symbols are
used for the corresponding components of the other groups. The multiple bar
diagrams are very useful in situations of either the number of relative
components are large or the change in the values of the components of one
variable is important. Following example will illustrate how a multiple bar
diagram is drawn for given data.
Example 3: Draw the multiple bar diagram for the following data.
Sale Gross profit Net profit
Year
(in ,000 Rs) (in ‘000 Rs) (in, ‘000 Rs)
1990 100 30 10
1995 120 40 15
2000 130 45 25
2005 150 50 30
2010 200 70 30
Solution: Multiple bar diagram for the above data is given below.

(iv) Percentage Bar Diagram


Subdivided bar diagram drawn on the basis of the percentage of the total is
known as percentage bar diagram. When such diagrams are drawn, the length
of all the bars is kept equal to 100 and segments are formed in these bars to
represent the components on the basis of percentage of the aggregate. First of
all the total of the given variable is assumed equal to 100. Then the percentage
is calculated for each and every component of the variable. After then the
cumulative percentage are calculated for every component. Finally the bars are
subdivided into the cumulative percentage and presented like subdivided bar
28
diagram. Let us explain the procedure with the help of the example given Diagrammatic Presentation
below. of Data

Example 4: Draw a percentage bar diagram for the following data:


Category Cost Per Unit Cost Per Unit
(1990) (2000)
Material 20 32
Labour 25 36
Delivery 5 12
Total 50 80
Solution: First of all percentage and cumulative percentage are obtained for
both the years in various category.
Category Cost Per Unit (1990) Cost Per Unit (2000)
Cost % Cumulative Cost % Cumulative
Cost % Cost Cost % Cost
Material 20 40 40 32 40 40
Labour 25 50 90 36 45 85
Delivery 5 10 100 12 15 100
Total 50 100 80 100
On the basis of above table required percentage bar diagram is given below

(v) Deviation Bar Diagram


For representing net quantities excess or deficit, i.e. net profit, net loss, net
exports, net imports, etc., the deviation bar diagrams are used. Through this
kind of bars we can represent both positive and negative values. The values
which are positive can be drawn above the base line and negative values can
be drawn below it. The following example would explain this type of diagram:
Example 5: Draw a deviation diagram for the following data:
Year Sale Net profits
1990 20% 35%
2000 15% 50%
2010 35%  30%
Solution: Deviation diagram for the given data is shown on the next page:
29
Presentation of Data 60%
50%
40%
30%
20%
Sale (in %)
10%
0% Net Profit (in %)

-10% 1990 2000 2010


-20%
-30%
-40%

(vi) Broken Bar Diagram


If large variation exists in the values of certain type of data, i.e. some values
are very small and some are very large, then in order to gain space for the
smaller bars of the data, the large bar(s) may be presented as broken bars.
These bars are similar to the other bars but the form of presentation is
different because of having much variation from others. Let us illustrate the
idea of broken bar diagram with the help of the following example:
Example 6: Represent the following data by a suitable bar diagram.

Year Sale of cars


1950 200
1960 360
1970 442
1980 520
1990 587
2000 2860

Solution: The sale of the cars in year 2000 is almost 14 times that of in year
1950. In order to gain space for the sale figure in the year 1950, we have to use
broken bar to represent the sale of cars for year 2000. Subdivided bar diagram
for the given data is shown below.

30
14.3.1 Principles of Construction of Bar Diagrams Diagrammatic Presentation
of Data
(i) The width of each bar must be uniform in a diagram.
(ii) The gap between two bars should be uniform throughout the diagram.
(iii) Bars may be either horizontal or vertical. The vertical bars should be
preferred because they give a better look than horizontal bars and also
facilitate comparison. We will use vertical bars in our presentation.
(iv) The respective figures should also be written at the top of bars so that the
reader may able to know the precise value without looking at the scale.
Now, you can try the following exercises:
E1) Represent the following data by a suitable diagram:
Years: 2005 2006 2007 2008 2009 2010
Enrollment
of the students: 280 294 302 270 325 406
E2) Represent the following data by a suitable bar diagram:
Year: 2007-08 2008-09 2009-10
Gross Income: 440 480 520
Gross Expenditure: 410 440 490
Net Income: 160 180 175
Tax: 180 165 190
E3) Represent the following information by a suitable diagram:
Class Average marks Average Marks Average Marks
in Mathematics in Statistics in Physics
A 58 70 65
B 62 68 72
E4) Draw a suitable diagram for given expenditure data of two families.
Item Family A Family B
Food 300 350
Clothing 250 200
Education 280 300
Others 220 200
E5) Draw a suitable diagram to represent the following information:
Item Company A Company B
Selling Price 9500 8000
Raw Material 5500 6500
Direct Wages 3500 4000
Rent of Office 1500 1500

14.4 TWO DIMENSIONAL DIAGRAMS


In one dimensional diagrams only length of the bar is important and
comparison of bars are done on the basis of their lengths only, while in two
dimensional diagrams both length and width of the bars are considered, i.e. in
two dimensional diagrams given numerical figures are represented by areas of
the bars. So, two dimensional diagrams are also known as “Area Diagrams.”
The following are the types of two dimensional diagrams:
(i) Rectangles
(ii) Squares
(iii) Circles
31
Presentation of Data Let us discuss these one by one:
(i) Rectangles Diagram
In rectangles diagram given numerical figures are represented by areas of the
rectangles. We know that area of a rectangle = (length)  (breadth). So,
rectangles diagram is drawn by taking one of the two variables as lengths and
another variable as breadths of the rectangles along two axes. To understand
this diagram, go through to the following illustration.
Example 7: Two companies A and B produce the same item. Company A
produced 2000 units in January 2011 and in the same month company B
produced 2400 units. The production cost per unit for company A and
company B was Rs 12 and Rs 10.5 respectively. Represent these facts by using
rectangles diagram.
Solution: The rectangles for both companies are to be drawn on the following
basis:
Company A
Length = 2000 (total produced units)
Breadth = 12 (per unit production cost)
Area = 2000  12 = 24000
Company B
Length = 2400 (total produced units)
Breadth = 10.5 (per unit production cost)
Area = 2400  10.5 = 25200
Therefore, the length and width of rectangles of these companies will be in
proportion of 2000 : 2400 and 12 : 10.5 respectively. Now, the areas
calculated for both companies on the basis of their length and breadth given
above, represent the total cost of the two companies. These rectangles are
represented below.

(ii) Squares Diagram


When variation between given numerical figures is high then choice of
squares diagram is more suitable instead of rectangles diagram. Like
rectangles diagram here given numerical figures are represented by areas of
squares. We know that area of a square = (side)  (side)  (side) 2 . So, we take

32
(side) 2  given numerical figure  side of square  given numerical figure Diagrammatic Presentation
of Data
Remember that the base line would be same for all squares.
In other words, we follow the following steps for the construction of the square
diagram:
Step 1 Take the given numerical observations/figures as areas of the
corresponding squares.
Step 2 Take square roots of the given numerical observations/figures as sides
of the corresponding squares.
Step 3 Construct the corresponding squares like rectangle diagrams.
Let us discuss the method of drawing the square diagram with the help of the
following example:
Example 8: Represent the following data of the number of schools in a city A
from 1970-80 to 2000-10 in a square diagram.
Years 1970-80 1980-90 1990-2000 2000-10

Number of 4 9 36 64
schools in city A

Solution:
Step 1 Areas of the corresponding squares = 4, 9, 36, 64
Step 2 Sides of the corresponding squares = 4 , 9 , 36 , 64 = 2, 3, 6, 8
Step 3 Square diagram for the given data is shown below.

Remark 1: If in some cases given observations are large and so their square
roots, then we can adjust the scale in usual way.
For example, suppose the given observations are 256, 1600, 5184, 9216, then
sides of the squares will be
256 , 1600 , 5184 , 9216  16, 40, 72, 96.
Here we can adjust the scale by taking 16 units = 1 unit, after this, sides of the
squares reduces to 1, 2.5, 4.5, 6. Now using sides of the squares as 1, 2.5, 4.5,
6, we can construct the square diagram as done in above example, provided
we have to mention in the right top most corner the scale used (i.e. 16 units =
1 unit along both axes).
33
Presentation of Data (iii) Circles Diagram
Another form of preparing the two dimensional diagram is circle diagram. As
in square diagram we took given numerical figures/observations as the areas of
the corresponding squares. Similarly, here we take given numerical
figures/observations as areas of the corresponding circles. But as we know that
Area (A) of a circle =  r 2 , where r is radius of the circle

 if y  ax, where a is 
 A  r , read as A is proportional to r  constant, then we say that 
2  2

 y is proportional to x. 
  r 2  Given numerical figures/observations

 r 2  Given numerical figures/observations as  is constant


 r  Square roots of the given numerical figures/observations
Therefore, we follow the following steps for the construction of the circle
diagram:
Step 1 Take the given numerical observations/figures as areas of the
corresponding circles.
Step 2 Take squares of the radii ( r 2 ) of the corresponding circles proportional
to the given numerical figures/observations as sides of the
corresponding squares.
Step 3 Take radii (r) of the corresponding circles proportional to the square
roots of the given numerical figures/observations.
Step 4 Construct the corresponding circles like rectangles/squares diagrams.
Circles diagram is the simplest of the two dimensional diagrams used for
illustrating the totals having large differences in them like squares diagram.
But circles diagram looks more attractive than squares diagram and therefore
use of circle diagram is more popular compare to squares diagram. There are
as many circles drawn as the totals for representation.
Let us discuss the method of drawing the square diagram with the help of the
following example:
Example 9: Draw a circles diagram for the data given in Example 8.
Solution: Using the data of Example 8 for drawing a circles diagram, we have
Step 1 Areas of the corresponding circles (  r12 ,  r22 ,  r32 ,  r42 ) = 4, 9, 36, 64
Step 2 Square of radii of corresponding circles are proportional to 4, 9, 36, 64
i.e. r12 , r22 , r32 , r42  4, 9, 36, 64

Step 3 Radii of corresponding circles are proportional to 4 , 9 , 36 , 64

i.e. r1 , r2 , r3 , r4  4 , 9 , 36 , 64  2, 3, 6, 8
Step 4 Circles diagram for the given data is shown on the next page. Radii of
the circles lie on the dotted line.

34
Diagrammatic Presentation
of Data

Remark 2: Here also we can follow similar approach as discussed in


Remark1.

Now, you can try the following exercises.


E6) Draw a suitable diagram to represent the following data:
Rate per item Sale of item
Company P 20 400
Company Q 30 600
E7) Draw a squares diagram for the data given below:

Year 1980 1990 2000 2010

Number of 16 25 65 150
colonies in city A

E 8) Draw a circle diagram for the data given in E 7)

14.5 PIE DIAGRAMS


Pie diagram/chart is used when the requirement of the situation is to know the
relationship between whole of a thing and its parts, i.e. pie chart provides us
the information that how the entire thing is divided up into different parts. For
example, if the total monthly expenditure of a family is Rs 1000, out of which
Rs 250 on food, Rs 200 on education, Rs 100 on rent, Rs 150 on transport, and
Rs 300 on miscellaneous items are spent. Then this gives us the information
that 25%, 20%, 10%, 15% and 30% of the total expenditure of the family are
spent on food, education, rent, transport and miscellaneous items respectively.
Here we note that if money spent on food (say) increased from 25% to 30%
then percentages of other head(s) must shrink so that total remains 100%.
Similarly, if money spent on any one of the heads decreased then percentages

35
Presentation of Data of other head(s) must spread so that total remains 100%. That is why pie chart
gives relationship between whole and its parts.
Steps used for constructing a pie chart.
Step 1 Find the total of different parts.
Step 2 Find the sector angles (in degrees) of each part keeping in mind that
total angle around the centre of a circle is of 360 0.
Step 3 Find the percentage of each part taking the total obtained in step 1 as
100 percent.
Step 4 Draw a circle and divide it into sectors, where each sector (or area of
the sector) of the circle with corresponding angles obtained in step 2
will represent the size of corresponding parts. Diagram thus obtained is
nothing but pie chart fitted to the given data.

Let us explain the procedure with the help of the following example:
Example 10: A company is started by the four persons A, B, C and D and
they distribute the profit or loss between them in proportion of 4 : 3 : 2 : 1 . In
year 2010 company earned a profit of Rs 14400. Represent the shares of their
profits in a pie chart.
Solution: Given ratio is 4 : 3 : 2 : 1
 sum of ratios = 4 + 3 + 2 + 1 = 10
Calculation of Degrees and Percentages

Partners Profits (in Rs) Sector Angles Percentages


(in degree)
4 5760
14400   5760  360  144
A 10 14400 5760
100  40
4 14400
or  360  144
10
3 4320
14400   4320  360  108
B 10 14400 4320
100  30
3 14400
or  360  108
10
2 2880
14400   2880  360  72
C 10 14400 2880
100  20
2 14400
or  360  72
10
1 1440 1440
14400   1440  360  36  100  10
D 10 14400 14400

1
or  360  36
10
Total 14400 360 100

36
Solution: On the basis of above calculation, pie chart which shows the shares Diagrammatic Presentation
of profit of the four partners is shown on the next page: of Data

Profits (in Rs)

10 %
D

Partner A
20% 40 %
C A Partner B
Partner C
Partner D
30%
B

Note:
(i) In drawing the components on the pie diagram it is advised to follow
some logical arrangements, pattern or sequence. For example, according
to size, with largest on top and others in sequence running clock wise.
(ii) Pie chart is used only when
(a) total of the parts make a meaningful whole. For example, total of the
expenditures of a family on different items make a meaningful whole,
but if in a city there are 100 doctors, 40 engineers, 50 milkmen, 80
businessmen then total of these do not make a meaningful whole so
pie chart should not be used here.
(b) observations in different parts are mutually exclusive. For example in
the situation discussed in part (a) a businessman may also be an
engineer so the observations in different parts are not mutually
exclusive.
(c) observations of the different parts are observed at the same time.
We have discussed the method of drawing pie diagram, in this section. Let us
discuss some limitations of the pie diagram.

Limitation of Pie Diagram


The following are the limitations of the pie diagram/chart:
(i) For accurate reading and interpretation, particularly when data are divided
into a large number of components or the difference among the values of
components is very small, the pie diagram is less effective than the bar
diagrams.
(ii) Attractiveness of a pie chart suffers if the number of parts of the whole is
more than 7 or 8. That is, pie chart should be avoided if number of parts of
the whole is more than 7 or 8.
37
Presentation of Data (iii) In terms of comparison, the pie diagram appears inferior to simple bar
diagram or divided bar diagram.
(iv) Pie chart is used only when total of the parts make a meaningful whole.
(v) Pie chart should not be used if observations of the different parts are not
mutually exclusive.
(vi) Pie chart should not be used if observations of the different parts are
observed at different time.
Now, you can try the following exercises.
E9) Represent the following data of utilization of 100 paise of income by
XYZ company in year 2009-10.
Item/Head Money spent (in paise)
Manufacturing Expenses 42
Salaries of employees 14
Selling and distribution Expenses 8
Interest Charges 6
Advertisement Expenses 15
Excise duty of sales 5
Taxation 10
E10) Draw a pie diagram to represent the expenditure of Rs 100 over
different budget heads as given below of a family
Item Expenditure (in Rs.)
Food 25
Clothing 15
Education 20
Transport 10
Outing 10
Miscellaneous 5
Saving 15

14.6 PICTOGRAM
Pictograms, also known as picture grams, are very frequently used in
representing statistical data. Pictograms are drawn with the help of pictures.
These diagrams indicate towards the nature of the represented facts.
Pictograms are attractive and easy to comprehend and as such this method is
particularly useful in presenting statistics to the layman.
The picture which is used as symbols to represent the units or values of any
variable or commodity selected carefully. The picture symbol must be self
explanatory in nature. For example, if the increase in number of Airlines
Company is to be shown over a period of time then the appropriate symbol
would be an aeroplane.
The pictograms have the following merits:
(i) The magnitudes of the variables may be known by counting the pictures.
(ii) An illiterate person can also get the information.
(iii) The facts represented in a pictorial form can be remembered longer.

38
Example 11: Draw a pictogram for the data of production of tea (in hundred Diagrammatic Presentation
kg) in a particular area of Assam from year 2006 to 2010. of Data

Year 2006 2007 2008 2009 2010


Production of Tea (in 100 kg.) 2.5 3.0 4.0 5.5 7.0
|Solution: Pictogram for the production of tea in a particular area of Assam
from year 2006 to 2010 is shown below:

Now, you can try the following exercise.


E 11) Draw a pictogram for production of mangoes in a particular area of
Maharashtra from 2006 to 2010.
Year 2006 2007 2008 2009 2010
Production of Mangoes 5.0 4.5 6.0 3.5 5.5
(in tons)

14.7 CARTOGRAM
Representation of the numerical facts with the help of a map is known as
cartogram. By representing the facts by maps, the impact of the results on
different geographical area may be shown and to be compared also. Maps are
helpful in comparative study of various districts of a state or different states of
a country. For example, the production of wheat in different geographical areas
can also be represented by cartogram. The quantities on the map can be shown
in many ways, such as through shads or colours or by dots or by placing
pictograms in each geographical area or by the appropriate numerical figure in
each geographical area.
Let us take an example to get a look of the cartogram.
Example 11: Density per square kilometer in different states and union
territories in India according 2011 census data is given below.
State/Union Density State/Union Density State/Union Density
Territory (per sq. Territory (per sq. Territory (per sq.
km. km. km.
Andhra P 308 Kerala 859 Tripura 350
Arunachal P 17 Madya P 236 Uttarakhand 189
Assam 397 Maharashtra 365 Uttar P 828

39
Presentation of Data Bihar 1102 Manipur 122 West Bengal 1029
Chhattisgarh 189 Meghalaya 132 Andaman and N I 46
Goa 394 Mizoram 52 Chandigarh 9252
Gujarat 308 Nagaland 119 Dadar and N H 698
Haryana 573 Orissa 269 Daman and Diu 2169
Himachal P 123 Punjab 550 Delhi 11297
J and K 124 Rajasthan 201 Lakshadeep 2013
Jharkhand 414 Sikkim 86 Pondicherry 2598
Karnataka 319 Tamil Nadu 555
Represent the above data with the help of cartogram.
Solution: Cartogram for the above data is given below:

14.8 CHOICE OF A SUITABLE DIAGRAM


In Secs. 14.3 to 14.7 we have studies many types of diagrams, so a reasonable
question may arise in your mind is that how we come to know that which is the
suitable diagram in a given situation? To answer this question absolutely is not
an easy job because there are situations in which more than one diagram may
be used, secondly this is not a complete list of the diagrams. Even though there
are some suggestions which may help you to select an appropriate diagram in a
given situation.
40
The choice would primarily depend upon two factors, namely: Diagrammatic Presentation
of Data
(i) The Nature of the Data: The nature of data would depend whether to use
one dimensional, two dimensional or three dimensional diagrams and if it
is one dimensional, whether it is simple, sub-divided, multiple or some
other type. A cubic diagram would be preferred to a bar if the magnitudes
of the figures are very wide apart.
(ii) The Type of People for whom the Diagram is to be made: For drawing
attention of an undedicated mass pictogram or cartograms are more
effective.
Some more points which may address the question raised are given below:
 Simple bar diagrams should be used when changes in totals are required
to be represented.
 Sub-divided bar diagrams are more useful when changes in totals as well
as in components figures (absolute ones) are required to be represented.
 Multiple bar diagrams should be used where changes in the absolute
values of the component figures are to be emphasised and the overall
total is of no importance.
 The multiple and sub- divided bar diagrams are used for not more than
four or five components. For more than five components, pie diagrams
will be the best choice.
 Percentage bar diagrams are better choices when changes in the relative
size of component figures are to be displayed.
 Pictograms and cartograms are very elementary forms of visual
presentation.
 The pictogram is admirably suited to the publications of articles in
newspapers and magazines or in reports.
 Cartograms or statistical maps are particularly effective in bringing out
the geographical pattern that may be handelled in the data.

14.9 SUMMARY
This unit covered the diagrammatic presentation of the data. In this unit, we
have discussed:
1) One dimensional diagrammatic presentation of the data.
2) How to draw different types of bar diagrams.
3) How to draw two dimensional diagrams to represent the given data.
4) How to draw Pie diagram.
5) How to draw Pictograms and Cartograms for the pictorial representations.
6) The selection of an appropriate diagram to represent the data of a given
situation.

14.10 SOLUTIONS/ANSWERS
E1) The suitable diagram in this case is simple bar diagram which is shown
on the next page:

41
Presentation of Data
Enrollments of the students (in Numbers)
450
406
400
350 325
294 302
300 280 270
250
200 Enrollments of
150 the students (in
Numbers)
100
50
0
2005 2006 2007 2008 2009 2010

E2) The suitable diagram in this case is multiple bar diagram which is
shown as follows:

E3) The suitable diagram in this case is multiple bar diagram which is
shown as follows:

42
E4) The suitable diagram in this case is subdivided bar diagram which is Diagrammatic Presentation
shown as follows: of Data

E5) The suitable diagram in this case is percentage bar diagram. So first of
all we have to calculate percentage and cumulative percentage for both
the companies in various categories as given below:
Category Company A Company B
Cost % Cumulative Cost % Cumulative
Cost % Cost Cost % Cost
Selling 9500 47.5 47.5 8000 40 40
price
RM 5500 27.5 75 6500 32.5 72.5
DW 3500 17.5 92.5 4000 20 92.5
ROO 1500 7.5 100 1500 7.5 100
Total 20000 100 20000 100

On the basis of the above calculation subdivided bar diagram is given below:

43
Presentation of Data E6) The suitable diagram in this case is rectangles diagram. The rectangles
for both companies are to be drawn on the following basis.
Company P
Length = 400 (items sold)
Breadth = 20 (rate per item)
Area = 400  20  8000
Company Q
Length = 600 (items sold)
Breadth = 30 (rate per item)
Area = 600  30 = 18000
Therefore, the length and breadth of the two rectangles will be in
proportion of 400 : 600 and 20 : 30 respectively. Now, the areas
calculated for both companies on the bases of their length and breadth
given above, represent the total cost of the companies. These
rectangles are represented below.

E7) Step 1 Areas of the corresponding squares = 16, 25, 65, 150

Step 2 Sides of the corresponding squares = 16 , 25 , 65 , 150


= 4, 5, 8.06, 12.25
Here we can adjust the scale (as discussed in Remark 1).
Let us take 4 units = 1 unit, then we have
Sides of the corresponding squares = 1, 1.25, 2.02, 3.06

Step 3 Square diagram for the given data is shown on the next page:

44
Diagrammatic Presentation
of Data

E 8) Following the similar steps as in Example 9 we have

r1 , r2 , r3 , r4  16 , 25 , 65 , 150  4, 5, 8.06, 12.25 .


Here we can adjust the scale (as discussed in Remark 1).

Let us take 4 units = 1 unit, then we have

r1 , r2 , r3 , r4  1, 1.25, 2.02, 3.06.


Circle diagram for the given data is shown below:

E9) The suitable diagram in this case is pie diagram. Calculation of degrees
and percentages (as we did in Example 10) is an exercise for you. On
the basis of calculation, pie chart which shows the utilization of 100
paise of income by XYZ company in year 2009-2010 is shown on the
next page:

45
Presentation of Data

E10) The suitable diagram in this case is pie diagram. Calculation of degrees
and percentages (as we did in Example 10) is an exercise for you. On
the basis of calculation, pie chart which shows the expenditure of a
family on different items is shown below:

E 11) We locate the production of mangoes through the picture of mango for
the different years according to different magnitude of the data (taking
1 mango = 1 tons mangoes)

46
UNIT 15 GRAPHICAL PRESENTATION OF Graphical Presentation of
Data-I
DATA-I
Structure
15.1 Introduction
Objectives
15.2 Graphical Presentation
15.3 Types of Graphs
Histogram
Frequency Polygon
Frequency Curve
Ogive
15.4 Summary
15.5 Solutions/Answers

15.1 INTRODUCTION
An important function of Statistics is to present the complex and huge data in
such a way that they can easily understandable. In previous unit, we have
discussed the diagrammatic presentation of the data where we have become
familiar with some of the most commonly used diagrams. After discussing the
diagrammatic presentation of data, we are now moving towards the graphical
presentation of data. The graphs are plotted for frequency distributions and are
used to interpolate/extrapolate items in a series including locating various
partition values. In this unit, we shall discuss some of the most useful and
commonly used graphs.
The graphical presentation can be divided into two categories
(i) Graphs for frequency distributions.
(ii) Graphs for time series.
In this unit, we will concentrate ourselves to the graphs for frequency
distributions only. In this regard, we would like to discuss the most commonly
used graphs for frequency distributions, i.e. Histograms, Frequency polygon,
Frequency curve and Cumulative frequency curves, or Ogives.
Objectives
After studying this unit, you would be able to:
 describe the graphical presentation;
 explain the advantages of graphical presentation;
 draw the histogram for continuous frequency distribution;
 draw the frequency polygon for a frequency distribution;
 draw the frequency curves of different shapes; and
 draw the cumulative frequency curves.

47
Presentation of Data
15.2 GRAPHICAL PRESENTATION
A graphical presentation is a geometric image of a set of data. Graphical
presentation is done for both frequency distributions and times series. Unlike
diagrams, they are used to locate partition values like median, quartiles, etc, in
particular, and interpolate/extrapolate items in a series, in general. They are
also used to measure absolute as well as relative changes in the data. Another
important feature of graphs is that if a person once sees the graphs, the figure
representing the graphs is kept in his/her brain for a long time. They also help
us in studying cause and affect relationship between two variables. The graph
of a frequency distribution presents the huge data in an interesting and effective
manner and brings to light the salient features of the data at a glance. Before
closing this Sec. let us see some advantages of graphical presentation.
Advantages of Graphical Presentation
The following are some advantages of the graphical presentation:
 It simplifies the complexity of data and makes it readily understandable.
 It attracts attention of people.
 It saves time and efforts to understand the facts.
 It makes comparison easy.
 A graph describes the relationship between two or more variables.
After going through the advantages of graphical presentation of data, you were
keen to know the commonly used graphs to represent the data and how these
graphs are drawn. Next section will address these issues.

15.3 TYPES OF GRAPHS


Now days a large variety of graphs are in practical use. However, we shall
discuss only some important graphs which are more popularly used in practice.
The various types of graphs can be divided broadly under the following two
heads:
(i) Graphs of Frequency Distributions
(ii) Graphs for Time Series Data
In this Sec. we will focus on graphs for frequency distributions and graphs for
time series data will be discussed in next unit, i.e. Unit 16.
Graphs of Frequency Distributions
The graphical presentation of frequency distributions is drawn for discrete as
well as continuous frequency distributions.
Let us first consider the frequency distribution of a discrete variable.
To represent a discrete frequency distribution graphically, we take two
rectangular axes of co-ordinates, the horizontal axis for the variable and the
vertical axis for the frequency. The different values of the variable are then
located as points on the horizontal axis. At each of these points, a perpendicular
bar is drawn to present the corresponding frequency.
Such a diagram is called a ‘Frequency Bar Diagram’. For example, if we take
the frequency distribution for the number of peas per pod for 198 pods as given
in Table 15.1:

48
Table 15.1 Graphical Presentation of
Data-I
No of peas per pod 1 2 3 4 5 6 7
Frequency (number of pods) 14 23 66 40 26 18 11

Then, the frequency bar diagram is shown in Fig. 15.1:

Fig. 15.1: Frequency Bar Diagram for the Frequency Distribution of Number of the Peas for 198 Pods.

Note 1: O represents origin and choice of scale used along horizontal and
vertical axes depends upon given data.
Now, we take the case of frequency distribution of a continuous variable.
The following are the most commonly used graphs for continuous frequency
distributions:
(i) Histogram
(ii) Frequency Polygon
(iii) Frequency Curve
(iv) Cumulative Frequency Curve or Ogives
Let us discuss these one by one:
(i) Histogram
In previous example, we have discussed how a graph is drawn for discrete
frequency distribution.
For the continuous frequency distribution, a better way to represent the data
graphically is to use a histogram. A histogram is drawn by constructing
adjacent rectangles over the class intervals such that the length of the
rectangles is proportional to the corresponding class-frequencies.
Histogram is similar to a bar diagram which represents a frequency distribution
with continuous classes. The width of all bars is equal to class interval. Each
rectangle is joined with the other so as to give a continuous picture.
The class-boundaries are located on the horizontal axis. If the class-intervals
are of equal size, the heights of the rectangles will be proportional to the class-
frequencies themselves. If the class-intervals are not of equal size, the heights
of the rectangles will be proportional to the ratios of the frequencies to the
49
Presentation of Data width of the corresponding classes. In other words, the frequencies of the class-
intervals having the least width are written as they are and the frequencies of
other class intervals are written as follows:
Given frequency
  The least width  … (15.1)
Width of its Class-interval
Let us draw a histogram to the following frequency distribution given below in
the table 15.2
Table 15.2
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Intervals
Frequency 2 3 13 18 9 7 6 2

Histogram for the above data is given below.

Fig. 15.2: Histogram for Frequency Distribution when Class-intervals are of Equal Width.

Now, let us consider the frequency distribution for unequal class intervals as
given in the Table 15.3
Table 15.3
Class 0-10 10-20 20-30 30-40 40-70 70-80 80-100
Frequency 20 32 8 2 60 35 10
As it is a case of unequal class intervals, so we have to adjust the frequencies of
the classes 40-70 and 80-100 by the formula suggested in equation 15.1. These
calculations are shown in table 15.4 given below:
Table 15.4
Class Interval Frequency Width of Heights of the rectangles
(CI) (CI)
0-10 20 10 20
10-20 32 10 32
20-30 8 10 8
30-40 2 10 2
40-70 60 30 (60/30)  10 = 20
70-80 35 10 35
80-100 10 20 (10/20)  10 = 5

50
The histogram for this frequency distribution is shown in Fig. 15.3. Graphical Presentation of
Data-I

Fig. 15.3: Histogram for Frequency Distribution when Class Intervals are of Unequal Width
Note 2: Sometimes, a histogram is also used for the frequency distribution of a
discrete variable. Each value of the discrete variable is regarded as the mid-
point of an interval. But generally, its use is not recommended, because in
discrete case each frequency actually corresponds to a single point and not to
an interval.
Now, you can try the following exercises.
E1) Draw a histogram from the following data
Class Interval: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
Frequency: 3 5 10 14 24 17 14 10 3
E2) Draw a histogram for the following frequency distribution
Wage (Rs): 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
No. of
Workers: 30 70 100 110 140 150 130 100 90 60

(ii) Frequency Polygon


Another method of presenting a frequency distribution graphically other than
histogram is to use a frequency polygon. In order to draw the graph of a
frequency polygon, first of all the mid values of all the class intervals and the
corresponding frequencies are plotted as points with the help of the rectangular
co-ordinate axes. Secondly, we join these plotted points by line segments. The
graph thus obtained is known as frequency polygon, but one important point to
keep in mind is that whenever a frequency polygon is required we take two
imaginary class intervals each with frequency zero, one just before the first
class interval and other just after the last class interval. Addition of these two
class intervals facilitate the existence of the property that
Area under the polygon = Area of the histogram
For example, if we take the frequency distribution as given in Table 15.2 then,
we have to first plot the points (5, 2), (15, 3), …, (75, 2) on graph paper along
with the horizontal bars. Then we join the successive points (including the mid
points of two imaginary class intervals each with zero frequency) by line
segments to get a frequency polygon. The frequency polygon for frequency
distribution given in Table 15.2 is shown in Fig. 15.4.

51
Presentation of Data

Fig. 15.4: Frequency Polygon for the Frequency Distribution given in Table 15.2.

Note 3: In some cases first class interval does not start from zero. In such
situations we mark a kink on the horizontal axis, which will indicates the
continuity of the scale starting from zero. Let us take an example of this type.

Example 1: Draw a frequency polygon for the following frequency


distribution:

Class 40-50 50-60 60-70 70-80 80-90 90-100 100-110 110-120


Interval
Frequency 4 10 11 13 18 14 11 5

Solution: Frequency polygon for the given data is shown in Fig. 15.5:

Fig. 15.5: Frequency Polygon for Continuous Frequency Distribution.

Now, you can try the following exercise.


E3) Draw a frequency polygon from the following data
Wage: 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
No of
Workers: 20 50 80 110 140 150 120 150 100 80

(iii) Frequency Curve


In simple words frequency curve is a smooth curve obtained by joining the

52
points (not necessary all points) of the frequency polygon such that Graphical Presentation of
Data-I
(a) Like frequency curve it also starts from the base line (horizontal axis) and
ends at the base line.
(b) Area under frequency curve remains approximately equal to the area under
the frequency polygon.
In other words, let us try to explain the concept theoretically. Suppose we draw
a sample of size n from a large population. Frequency curve is the graph of a
continuous variable. So theoretically continuity of the variable implies that
whatever small class interval we take there will be some observations in that
class interval. That is, in this case there will be large number of line segments
and the frequency polygon tends to coincides with the smooth curve passing
through these points as sample size (n) increases. This smooth curve is known
as frequency curve.
In the following example we have drawn both frequency polygon and
frequency curve to make the idea clear for you.
Example 2: Draw frequency polygon and frequency curve for the following
frequency distribution.

Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90


Intervals
frequency 2 5 8 15 18 10 3 1

Solution: Frequency polygon and frequency curve for the above data is given
below in Fig. 15.6.

Fig. 15.6: Frequency Curve along with Frequency Polygon.

On the next page some important types of frequency curves are given which
are generally obtained in the graphical presentations of frequency distributions.
That is, symmetrical, positively skewed, negatively skewed, J shaped, U
shaped, bimodal and multimodal frequency curves. You note that the shapes of
these curves justify their names.

53
Presentation of Data

Fig. 15.7

Now, you can try the following exercise.


E4) Draw a frequency curve from the following frequency distribution
Class Intervals: 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9
Frequency: 3 9 20 25 18 12 6 3
(iv) Cumulative Frequency Curves
In sub Sec. 13.2.2 of the Unit 13 of this course you have already studied the
concept of cumulative frequency, cumulative frequency distribution, less than
cumulative frequency, more than cumulative frequency, less than cumulative
frequency distribution, more than cumulative frequency distribution, etc. Here
our aim is to study graphical presentation of less than and more than
cumulative frequency distributions, which are known as less than frequency
curve (or less than ogive) and more than frequency curve (or more than ogive)
respectively.
For drawing less than cumulative frequency curve (or less than ogive), first of
all the cumulative frequencies are plotted against the values (upper limits of the
class intervals) up to which they correspond and then we simply join the points
by line segments, curve thus obtained is known as less than ogive. Similarly,
more than frequency curve (more than ogive) can be obtained by plotting more
than cumulative frequencies against lower limits of the class intervals. As we
have already mentioned within brackets that less than cumulative frequency
curve and more than cumulative frequency curve are also called less than
ogive and more than ogive respectively.
In other words we may define less than ogive and more than ogive as follow:
Less Than Ogive: If we plot the points with the upper limits of the classes as
abscissae and the cumulative frequencies corresponding to the values less then
the upper limits as ordinates and join the points so plotted by line segments, the
curve thus obtained is nothing but known as “less than cumulative frequency
curve” or “less than ogive”. It is a rising curve.

54
More Than Ogive: If we plot the points with the lower limits of the classes as Graphical Presentation of
abscissae and the cumulative frequencies corresponding to the values more Data-I
than the lower limits as ordinates and join the points so plotted by line
segments, the curve thus obtained is nothing but known as “more than
cumulative frequency curve” or “more than ogive”. It is a falling curve.
Let us draw both the ogives (‘less than’ and ‘more than’) for the following
frequency distribution of the weekly wages of number of workers given in
Table 15.5.

Table 15.5
Weekly 0-10 10-20 20-30 30-40 40-50
wages
No. of 45 55 70 40 10
workers

Before drawing the ogives, we make a cumulative frequency distribution as


given in table 15.6

Table 15.6
Weekly No. of Less than Cumulative More than Cumulative
wages workers frequency distribution frequency distribution
Wages Number of Wages Number of
Less than workers More than workers
0-10 45 10 45 0 220
10-20 55 20 100 10 175
20-30 70 30 170 20 120
30-40 40 40 210 30 50
40-50 10 50 220 40 10

From above data, we construct both the ogives as shown in Fig. 15.8 and Fig.
15.9:

Fig. 15.8: Less Than Ogive.

55
Presentation of Data

Fig. 15.9: More Than Ogive.

For “less than ogive” as shown on previous page in Fig. 15.8, we have plotted
the points (10, 45), (20,100), (30, 170), (40, 210), (50, 220) and then joined
them by line segments. Similarly, for “more than ogive” as shown above in
Fig. 15.9, we have plotted the points (0, 220), (10, 175), (20, 120), (30, 50),
(40, 10), and then joined them by line segments.
If we want to obtain a partition value, using ogives, we draw dotted horizontal
line through that value at y-axis which corresponds to the partition value and
then from the point, where it meets the less then ogive, we draw a dotted
vertical line and let it meets the x-axis. The abscissa of the point, where it
meets the x-axis is the required partition value. For example, suppose we want
to find first quartile, then we draw a dotted horizontal line starting from y-axis
at a point corresponding to N/4 and let it meets the “less than ogive”. From that
point at “less than ogive”, we draw a dotted vertical line and let it meets the x-
axis. The abscissa corresponding to this point is the first quartile. Similarly, for
finding median or second quartile, we start drawing dotted horizontal line from
y-axis at a point corresponding to N/2 and then we proceed as described above.
Similarly, for third quartile 3N/4 is taken in place of N/2. In this way, we may
find any partition value.
Note 4: Median may also the obtained by drawing dotted vertical line through
the point of inter section of both the ogives, when drawn on a single figure.
Now, you can try the following exercises.
E5) Draw two ogives from the following data
Class: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
Frequency: 3 6 10 13 20 18 15 9 6
Hence find median. Compare your result by calculating median by direct
calculatios.
E6) Draw less than ogive from the following frequency distribution of marks
of 90 students
Marks: 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79
No. of Students: 7 11 19 8 20 14 8 3
Hence find Q1, Q2 and Q3.
E7) Draw the more than ogive for the following frequency distribution of the
weekly wages of workers:
Weekly wages: 0-10 10-20 20-40 40-50 50-60 60-70 70-80 80-90 90-100
No. of Workers: 5 15 20 30 45 35 25 15 10

56
15.4 SUMMARY Graphical Presentation of
Data-I
In this unit we have discussed:
1) Various types of graphical presentation of data.
2) Way of drawing histogram for continuous frequency distributions.
3) Frequency polygon for a frequency distribution.
4) Frequency curves of different shapes, and
5) Way of drawing cumulative frequency graphs or ogives.

15.5 SOLUTIONS/ANSWERS

E1) Histogram of the given data is given below:

E2) Histogram of the given data is given below:

E3) Frequency polygon of the given data by first drawing histogram is given
on the next page.

57
Presentation of Data

Frequency polygon can also be drawn without histogram as given below:

Note 5: We can draw frequency polygon by first drawing a histogram or


we can draw directly.

E4) Frequency curve for the given data is given below.

30

25

20

15
Y-Values
10

0
0 2 4 6 8 10

E5) Two ogives for the given data are given on the next page.

58
Graphical Presentation of
Data-I

Calculations of Median through direct calculation:


Given data are

Class Interval Frequency (f) Cumulative frequency (cf)


(CI)
0-10 3 3
10-20 6 9
20-30 10 19
30-40 13 32
40-50 20 52
50-60 18 70
60-70 15 85
70-80 9 94
80-90 6 100
Total 100

Here N   f  sum of all frequencies 100


N 100
  = 50 and hence median class is 40-50.
2 2
So, in usual notations we have
l = lower limit of the median class = 40,
h = width of the median class = 50  40 = 10
c = cumulative frequency preceding the median class = 32
f = frequency of the median class = 20
hN  10  100  1
Now, Median  l    c   40    32   40   50  32 
f2  20  2  2
1
 40  18   40  9  49
2
Thus we see that median obtained by using two ogives graphically and by
direct calculation are same.
59
Presentation of Data
E6) Less than ogive for the given frequency distribution is given below.
N 90
To find Q1 :   22.5 first of all we have to draw a dotted
4 4
 N 
horizontal line starting from y-axis at 22.5  as  22.5  and then from
 4 
the point where it meets the less ogive, we shall draw dotted vertical line,
the value corresponding to the point where it meets the horizontal axis is
the value of Q1 as shown in figure. Similarly, values of Q 2 and Q 3 can
be obtained as shown in the figure below:

E7) More than ogive for the given frequency distribution is given below:

60
Graphical Presentation of
UNIT 16 GRAPHICAL PRESENTATION Data - II
OF DATA-II
Structure
16.1 Introduction
Objectives
16.2 Time Series Graphs
Method of Drawing a Time Series Graph
Types of Time Series Graph
16.3 Stem-and-Leaf Displays
Stem-and-Leaf Display for More than one Set of Data
Merits of Stem-and-Leaf Display
16.4 Box Plots
Method of Construction of the Box Plots
Components of a Box Plot
Box Plots with Outliers
Box Plots with + Signs
Box Plots with Whisker, + Sign and Outliers
16.5 Summary
16.6 Solutions/Answers

16.1 INTRODUCTION
In Unit 15 of this block, we have discussed some of the techniques of graphical
presentation of data. In that unit, we have restricted ourselves to the graphical
methods which are used for representing frequency distributions. The present
unit discusses the graphical methods for time series data. A time series graph is
frequently used for analysing and presenting the time series data. Range chart
is a type of time series graph which is used for showing the range of variation.
The Band chart is another type of time series graph which shows the total for
successive time periods broken up into subtotals for each of the component
parts of the total. In this unit, we shall also discuss as to how data are
represented by plotting stem-and-leaf displays and box plots. Stem-and-leaf
display is like histograms, but here the additional feature is that the given value
of each individual is also shown in these displays. Further, in this unit we shall
discuss box plots to represent the data through five-number summary.
Objectives
After studying this unit, you should be able to:
 describe a time series graph;
 describe the method of drawing a time series graph;
 draw a range chart and band chart;
 describe the method of drawing a stem-and-leaf display;
 describe the box plot and the different parts of the box plot; and
 draw the box plots.

16.2 TIME SERIES GRAPHS


A time series graph is drawn to analyse the time series data. It brings out the
pattern of fluctuations in time series data and facilitates in obtaining
61
Presentation of Data meaningful results about its future behaviour. To draw a time series graph, time
is measured along the horizontal axis and the observed data along the vertical
axis. After then points are plotted against the magnitudes corresponding to each
successive time period and finally these points are joined by the line segments.
The resulting zigzag curve is called the time series graph. For examples, a time
series graph for the production (in tons) of a commodity during the period
2001-2009 given in following table is shown in Fig. 16.1

Years 2001 2002 2003 2004 2005 2006 2007 2008 2009
Production 30 25 35 40 25 30 40 45 20

Production
50

40

30

20 Production

10

0
2001 2002 2003 2004 2005 2006 2007 2008 2009

Fig. 16.1 Production of a Commodity from 2001 to 2009

In rest of this section we shall discuss on method of drawing a time series


graph in subsection 16.2.1 and types of time series graph in subsection 16.2.2.
16.2.1 Method of Drawing a Time Series Graph
A time series is formed by the observations of a variable under study at
different phases of time. The time series graph is extremely helpful in
analysing the fluctuations in the values of the variable under study at different
phases of time. We generally take time variable along x-axis and the values or
magnitude of the observations of the variable under study along y-axis. After
plotting the values of the variable against the corresponding values of the time
variable as points, we join such points by line segments. The graph so drawn is
known as a time series graph or line graph. Such type of graph is simplest to
understand, easiest to draw and most widely used in practice. With the help of
these graphs several variables can be shown on the same graph and a
comparison can also be made. Following are some points which should be
followed while constructing a time series graph:
a) The scale of the y-axis should begin from zero even if the lowest y-value
associated with any x-period or value is far from zero. If necessary concept
of kink can be used.
Kink: Refer Note 3
on page number 52 b) If the unit of measurement is same, we can represent two or more variables
of Unit 15 of this on the same graph.
course. c) Not more than 5 or 6 number of variables is shown on the same graph,
otherwise the chart/graph becomes quite confusing.
d) When two or more variables are to be shown on the same graph, it is
advised to use different designs of lines to distinguish between the
variables.

62
16.2.2 Types of Time Series Graph Graphical Presentation of
Data - II
There are two types of time series graphs:
(i) Range Chart (ii) Band Chart
Let us discuss these one by one:
(i) Range Chart
A range chart is a very useful method of showing the series of range of
variation or fluctuation between the maximum and minimum values of a
variable at the same point of time. For example, if we are interested in showing
the minimum and maximum prices of a commodity for different periods of
time or the minimum and maximum marks obtained by the students in different
years, etc. the range chart would be the appropriate option.
For drawing a range chart, we take time variable along x-axis and the value of
other variable on the y-axis. Then we draw two line graphs together by plotting
the maximum and minimum observations in the given data. One curve
representing the highest values at different point of time of the variable and the
other one representing the lowest values at the same different point of time.
The gap between both the curves represents the range of variation. For
highlighting the difference between the lowest and highest values, the use of
some colour or shade should be made. Let us take an example of drawing a
range chart.
Example 1: Represent the following data by range chart.
Days

Max. Temp. in C

 
Min. Temp. in C


Monday 38 12
Tuesday 41 16
Wednesday 35 14
Thursday 42 15
Friday 44 18
Saturday 45 20
Sunday 46 21

Solution: Since there are two variables with same scales of measurement, both
the variables are shown on the same graph as in Fig. 16.2.

Fig. 16.2 Range Chart for the Maximum and Minimum Temperature in a Week

(i) Band Graph


Another type of a time series graph which shows the total for successive time
periods broken up into sub divisions for each of the components of the total is
known as band graph. In other words, the band graph represents the range of
the components of total and shows as to how and what to be distributed. The
63
Presentation of Data different components of the total are plotted as line graph one over the other in
band graph and the gaps between the successive lines are represented and
differentiated with filling up them by different shades, colours, etc., so that the
graph looked like the series of bands. The band graph is especially useful in
representing the components which divide such as total costs, total sales, total
production, etc. into their various respective components. For example, total
production may be divided into its components like nature of commodity,
machines, plants, etc. Band graph is also used where the data are put in
percentage form. The whole graph depicts 100% and the bands as the
percentage of the various components of the total. Let us take an example of
drawing a band graph.
Example 2: Present the data on the amount of production (in million tones) of
various plants from 2000-01 to 2007-08 given in the following table:
Year Plant-1 Plant-2 Plant -3 Plant -4
2000-01 42.6 38.3 26.5 31.7
2001-02 48.7 36.4 28.6 34.7
2002-03 47.2 32.4 25.3 30.8
2003-04 44.8 38.8 30.1 29.6
2004-05 46.7 37.2 27.4 32.6
2005-06 45.2 34.9 25.2 35.4
2006-07 49.1 37.8 29.1 38.2
2007-08 48.2 37.8 28.4 33.5
Solution: The above data can be most suitably presented through a band graph.
We proceed for constructing such a graph as follows:
We take the time on the x-axis and the other variable on the y-axis. Then we
plot the various points for different years for Plant-1 and join them by straight
line segments. This is represented by curve A (see Fig. 16.3). Now we add the
values of production of Plant-2 for various years to the values of production of
Plant-1 and plot the values and finally join them by straight line segments. This
is represented by curve B (see Fig. 16.3). The difference between the curves B
and A, gives the production of Plant-2. Now we add the values of production of
Plant-1 and Plant-2 to that of Plant-3 and plot the various points. This is
represented by curve C (see Fig. 16.3). The difference between curve C and
curve B represents production of plant -3. After that we add the values of
production of Plant-1, Plant-2 and Plant-3 to the values of production of Plant -
4 and draw a curve. This is represented by curve D (see Fig. 16.3). The
difference between curve C and curve D represents the production in Plant-4.
Using these steps required band graph is shown in Fig. 16.3.

Fig. 16.3: Band Graph for the Production in four Plants

64
Now, you can try the following exercises. Graphical Presentation of
Data - II
E1) Draw a range chart for the following data:
Class: 1 2 3 4 5 6 7 8 9 10
Max: Marks: 58 65 74 61 87 65 78 92 67 84
Min. Marks: 15 21 25 32 26 16 19 22 24 17
E2) Draw a band graph for the following data of quarterly results for profit (in
lakhs of rupees):
Quarters: Plant-I Plant-II Plant-III
Quarter-I 34 43 46
Quarter-II 41 47 41
Quarter-III 38 39 44
Quarter-IV 51 57 53

16.3 STEM-AND-LEAF DISPLAYS


In previous units of this block as well as in previous Sec. of this unit, we have
seen that data can be represented in a variety of ways including graphs, charts
and tables. In this Sec. we discuss another type of graph named stem-and-leaf
display. A stem-and-leaf display is very similar to a histogram but shows more
information. The stem-and-leaf display summarises the shape of a set of data
and provides the details regarding individual values. A stem-and-leaf display
quickly summarises data while maintaining the individual data points. Now a
day’s use of stem-and-leaf displays is increasing, so let us formally define it in
the next paragraph with some examples.
It has a vertical line of numbers obtained after removing the last digits (i.e. unit
digits) from the given numbers called starting parts and for each starting part
there is a horizontal line of numbers, i.e. the digits at the unit places of the
given numbers called leaves. And each complete horizontal line including
starting part and leaves is known as stem. The data displayed like this is
nothing but known as stem-and-leaf display. The distance between the lowest
values that are recorded in two consecutive stems is known as stem width or
category interval, which plays very important role in stem-and-leaf displays.
Stem-and-leaf displays are used in many situations like series of scores of
sports teams, series of temperature or rainfall over a period of time, series of
classroom test scores, etc.
Following example will illustrate the above discussion more clearly.
Example 3: We have a set of values of the test scores of 22 students in a class
as 11, 2, 28, 33, 48, 0, 42, 17, 24, 14, 0, 18, 26, 29, 35, 42, 22, 8, 28, 8, 46, 14.
Draw a simple stem-and-leaf display by taking stem width 10.
Solution: Simple stem-and-leaf display for the given data can be shown as
follows:
0 20088 (5)
1 17484 (5)
2 846928 (6)
3 35 (2)
4 8226 (4)
Total observations = 22

65
Presentation of Data After arranging the leaves in ascending order of magnitude, we have
0 00288
1 14478
2 246889
3 35
4 2268
Here starting parts show the 'tens digits' and the leaves show the ‘ones digits’ in
the above stem-and-leaf display. At a glance, one can see that 4 students got
marks in the 40's in their test out of 50. Out of these four students two got 42
marks each, whereas the other two got 46 and 48 marks in the test. Fourth row
(i.e. fourth stem) indicates that two students got marks in 30’s in their test out
of 50. And actual marks of these two students are 33 and 35. Similarly, we can
get the information about the marks of the other students from successive rows
(stems). When you count the total numbers of leaves, you may know how
many students appeared in the test. The information is nicely organised when a
stem-and-leaf display is used. Stem-and-leaf display provides a tool for specific
information in large sets of data, otherwise one would have a long list of marks
to arrange and analyse.
16.3.1 Stem-and-Leaf Display for more than one Set of Data
Stem-and-leaf display is also used to compare two sets of data. That is known
as 'back to back' stem-and-leaf display. For example, if you want to compare
the batting scores of two cricket players, then stem-and-leaf display is right
way to represent the data.
Example 4: Draw a stem-and-leaf display for batting scores of two players
given below.
Player A 102, 61, 82, 88, 90, 63, 69, 85, 105, 93, 65, 94, 107, 97, 67
Player B 104, 62, 83, 95, 106, 95, 108, 63, 108, 82, 93, 109
Solution: The scores of two players can be compared with the help of back to
back stem-and-leaf display as follows:
Leaf (Player A) Starting part Leaf (Player B)
13579 6 23
258 8 23
0347 9 355
257 10 46889
Here column of starting parts is now in the middle and the leaves columns are
to the right (player B) and left (player A) of the column of starting parts. You
can see that the player B has more innings with a highest score than the player
A. The player B has only 2 innings with scores of 62 and 63, while the player
A has 5 innings with the scores of 61, 63, 65, 67 and 69. You can also see that
player B has the highest score of 109, compared to player A with highest score
of 107. Thus we see that presentation of the data by stem-and-leaf display
provides us lot of information in very quick time.
In above two examples stem width or category interval was 10. Now we take
an example in which stem width is 5 instead of 10.
Example 5: Arrange the numbers 47, 35, 37, 20, 43,15, 15, 26,46, 25, 29, 12,
39, 44, 21, 24, 16, 40, 19, 46, 30, 34, 17, 39, 16, 40, 31, 21, 14, 42,16, 43, 22,
11, 24, 25, 31, 27, 40, 33 in a stretched stem-and-leaf display that has single-
digit starting parts and leaves, but has stem width of 5.

66
Solution: A simple stem-and-leaf display has a unique starting part for each Graphical Presentation of
stem with stem width 10, while the stretched stem-and-leaf display shown Data - II
below has a stem width 5 which means we have stretched the stem (of stem
width 10) into two stems each of width 5, and the same starting part is used for
both stems (i.e. for stem 1 we used 1a and 1b, for stem 2 we used 2a and 2b,
etc., it is also explained below the display). The required stretched stem-and-
leaf display is given as follows:
1a 241
1b 5569766
2a 0 1 4 12 4
2b 65957
3a 04113
3b 5799
4a 3400230
4b 655
In this stem-and-leaf display ‘a’ stands for the interval 0-4 and ‘b’ stands for 5-
9. The values between10-19 of stem 1 are now represented into two stems 1a
and 1b which include values between 10-14 and 15-19 respectively. Similarly,
values between 20-29 of stem 2 are now represented by two stems 2a and 2b,
which include values between 20-24 and 25-29 respectively, and so on.
16.3.2 Merits of Stem-and-Leaf Display
Following are some merits of stem-and-leaf display:
(i) Stem-and-leaf display arranges the data in place values.
(ii) Total number of observations and mode can easily be obtained from stem-
and-leaf display (see Example 3).
(iii) Summarises the shape of a set of data (the distribution) and provides the
detail regarding individual values.
(iv) Stem-and-leaf display also enables you to find quantiles such as median,
quartiles (i.e. Q1 , Q2 , Q3 ), deciles (i.e. D1 , D 2 , D3 , ... , D9 ), percentiles
(i.e. P1 , P2 , P3 , ..., P99 ), etc. As discussed below.
Formula for Calculating Quantiles: First of all given observations are
arranged in ascending order of magnitude. Then j mths quantiles denoted
by Q j/ m (e.g. 7/10 of the data are below Q 7 /10 ) is given by
Q j/ m  x i , where x i is that value of the variable below which j mths
observations lie and
 j n  1
i  ... (16.1) ,where n = total number of observations
 m  2
For example, let us apply this formula for finding median for the data of
Example 3:
Median = Second quartile = Q 2 / 4 :
 j  n  1  2  22  1
i      11.5
 m  2  4  2
 median = x11.5 = 11th observation in the array
+ 0.5(12th observation  11th observation)
= 22 + 0.5(24  22) = 23

67
Presentation of Data Now, you can try the following exercises.
E3) Draw a stem-and-leaf display with the following marks obtained by 30
students.
77, 80, 82, 68, 65, 59, 61, 57, 50, 62, 61, 70, 69, 64, 67,
70, 62, 65, 65, 73, 76, 87, 80, 82, 83, 79, 79, 71, 80, 77
Also determine the median for the marks.
E4) Draw a stem-and-leaf display for the following data.
31, 42, 22, 27, 33, 57, 67, 58, 64, 44, 65, 59, 46, 61, 35, 26, 63
Also find seventh decile.
E5) Draw a stem-and-leaf display for the given data:
141, 137, 105, 139, 107, 144, 110, 135, 117, 125, 147,113, 109, 120,
132, 110, 130, 112
Also find sixty seventh percentile.

16.4 BOX PLOTS


In previous section we have discussed the stem-and-leaf displays. Now let us
discuss another type of plot which is known as Box plot in this section. In
descriptive statistics, a box plot also known as a box-and-whisker plot is a
convenient way of graphically representing numerical data. It represents the
data through five-number summary, i.e. the smallest observation (sample
minimum), lower quartile Q1  , median Q 2  , upper quartile Q 3  , and the
largest observation (sample maximum). Box plots are used to describe a
distribution generally when it is extremely skewed or multimodal. A box plot
also indicates which observation(s), if any, might be considered as outliers. A
box plot is a quick graphic approach for examining one or more sets of data.
Box plots display differences between populations without making any
assumptions of the distributions. The spacing between the different parts of the
box helps in indicating the degree of dispersion (spread) and skewness in the
data, and identifies outliers. Box plots can be drawn either horizontally or
vertically. Here we will draw box plots vertically.
16.4.1 Method of Construction of the Box Plots
Box plots are useful for identifying key values while comparing two or more
distributions. To understand more clearly the method of constructing the box
plots, let us consider the following data of 37 students in a class who were
examined by a game with a box containing some balls. Their task was to select
a ball from one box placed at one corner and put it in another blank box placed
at another corner of the class as quickly as possible, and their times (in
seconds) were recorded. The scores were compared for 16 boys and 21 girls
who participated in game. Observed data are given in the following table:
Time (in seconds) for completing the given task
Boys 18, 19, 20, 22, 24, 25, 26, 16, 17, 19, 25, 27, 28, 23, 23, 31
Girls 15, 17, 18, 19, 20, 21, 23, 14, 17, 18, 19, 20, 21, 24, 19, 16, 17, 18, 20, 22, 28

The method of construction of separate box plots for the data of boys and girls
is discussed below:
There are several ways of constructing a box plot. The first relies on the
quartiles, lowest and greatest values in the distribution of scores. Fig. 16.4
shows how these three statistics are used for the above example. We draw a
68
box plot for each gender extending from the 1st quartile to the 3rd quartile. The Graphical Presentation of
2 nd quartile is drawn inside the box. Therefore, Data - II

(i) The bottom of each box is the 1 st quartile,


(ii) The top is the 3 rd quartile,
(iii) The line in the middle is the 2 nd quartile.
(iv) A line extending from the point corresponding to the smallest
observation to 1st quartile is drawn and known as lower whisker.
(v) A line extending from 3 rd quartile to the point corresponding largest
observation is drawn and is known as upper whisker.
Let us arrange the given data for each of the gender in ascending order as
shown in following table to find out the above components, i.e. Q1 , Q 2 ,Q 3 ,
smallest observation and largest observation.

Gender Times (in Seconds)


Boys 16, 17, 18, 19, 19, 20, 22, 22, 23, 23, 24, 25, 25, 27, 28, 31
Girls 14, 15, 16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 22, 23, 24, 28

For Boys
The lowest or smallest observation = x s = 16.
th
 16  1 
First quartile Q1     item
 4 
= 4.25th item = 4th item + 0.25 (5th item – 4th item)
= 19 + 0.25 (19  19) = 19
th
 16  1 
Second quartile = Q 2   2   item
 4 
= 8.5th item = Mean of 8th and 9th items
22  23
= = 22.5
2
th
 16  1 
Third quartile Q 3   3   item
 4 
= 12.75th item = 12th item + 0.75 (13th item – 12 th item)
= 25 + 0.75 (25 – 25) = 25
The largest observation = x l = 31
For Girls
The lowest or smallest observation = x s = 14
th
 21  1 
First quartile Q1     item
 4 
17  17
= 5.5th item = mean of 5 th and 6th items = = 17
2
th
 21  1 
Second quartile Q 2   2  th
 item = 11 item = 19
 4 
th
 21  1 
Third quartile Q 3   3   item
 4 
21 21
= 16.5th item = mean of 16th and 17 th item = = 21
2
69
Presentation of Data The largest observation = x l = 28
Box plots for boys and girls on the basis of the above findings are shown below
in Fig. 16.4.

Fig. 16.4: The Box Plots with the Whiskers for Boys and Girls

16.4.2 Components of a Box Plot


Let us now discuss the various components and terminologies which are used
in drawing the various types of box plots.
1. Upper and Lower Hinges
The upper and lower hinges are constructed to represent the third quartile
and first quartile respectively. For the example discussed above, the values
of upper and lower hinges are 21 and 17 for the girls whereas for the boys
they are 25 and 19 respectively.
2. H-Spread
This is calculated by taking the difference between the upper and lower
hinges. The H-Spread shows the spreadness of the elements of the data
between the first quartile and third quartile. For the data related to boys in
the example discussed above, the H-spread is 25–19 = 6 whereas for girls it
is 21–17 = 4.
3. Whiskers
The lines extending above and below the box are called whiskers. Lower
whisker extends from the point corresponding to the smallest observation to
Q1 and the upper whisker extends from Q3 to the largest observation.
4. Step
The step is calculated by multiplying the H-spread by 1.5. For the data of
boys in the above example, the value of 1 step is 1.5  6 = 9, whereas for
girls it is 1.5  4  6.
5. Upper and Lower Inner Fences
The upper and lower inner fences are calculated by adding one step to the
upper hinge and subtracting 1 step from the lower hinge respectively. In
other words, upper inner fence is equal to the 1 step ahead from upper
70
hinge whereas the lower inner fence is one step before the lower hinge. For Graphical Presentation of
the data of boys in the example discussed above, the upper and lower inner Data - II
fences are 25 + 9 = 34 and 19 – 9 = 10 whereas for the data of girls the
upper and lower inner fences are 21 + 6 = 27 and 17 – 6 = 11 respectively.
6. Upper and Lower Outer Fences
The upper and lower outer fences are calculated by adding two steps to the
upper hinge and subtracting 2 steps from the lower hinge respectively. For
the data of boys in the example discussed above the upper and lower outer
fence are 25  2  9  43 and 19  2  9  1 , whereas for the data of girls
they are, 21  2  6  33 and 17  2  6  5, respectively.
7. Upper and Lower Adjacent
The upper and lower adjacent are used to represent the largest and the
smallest observations of the data. For the data of boys in the example
discussed above upper and lower adjacent are 31 and 16 whereas for the
girls data they are 28 and 14 respectively.
8. Outside Value
The outside value is a value which is beyond an inner fence but not beyond
an outer fence. This is used to represent the scattered values of the data. To
represent these values circles are used.
9. Far Out Value
A value which is beyond the upper outer fence or lower outer fence is
known as far out value. To represent these values, the asterisks are used.
Box plot with the components discussed above for data of boys is shown in
Fig 16.5.

Fig. 16. 5 Box Plot for Boys with Inner and Outer Fences

Box plot with the components discussed above for data of girls is shown on
next page in Fig 16.6.

71
Presentation of Data

Fig. 16.6 Box Plot for Girls with Inner and Outer Fences

16.4.3 Box Plots with Outliers


Outside value(s) and far out value(s) are known as extreme observations. If the
extreme observations are present in the data then those observations may be
represented in box plot by an individual mark. The extreme observations are
described as outliers when they are represented in box plots. The outside value
is a value which is beyond an inner fence but not beyond an outer fence
whereas the far out value is a value which is beyond the lower and upper outer
fences. The individual marks for extreme values can be plotted above and
below to the whiskers in box plot. Specially, outside values are indicated by
small circles. In the data of girls in the above given example 28 is only far out
value, whereas in the data of boys no value is beyond the lower or upper inner
fence. The box plot for girls data indicating outside value by a circle is shown
in Fig. 16.6.

Fig. 16.6: The Box Blot for Girls with the Outlier.

16.4.4 Box Plots with + Signs


One more component which is to be included in box plots are the value of
some important parameters like, mean, mode, etc. We indicate the mean score
for a group of values by inserting a plus sign in box plots. For the example
discussed in sub-section 16.4.1 mean in case of boys data is 20.875 and mean
72
in case of girls data is 19.33, which are shown by a plus sign in Fig. 16.7 as Graphical Presentation of
given below. Data - II

Fig. 16.7: The Box Plot with Whisker and + Signs for Boys and Girls Data.
Fig.
16.7 provides a revealing summary of the data. Since half of the scores are
between the hinges (recall that the hinges are the first and third quartiles), we
see that half of the girl’s times are between 17 and 21, whereas half of the boy's
times are between 19 and 25.
16.4.5 Box Plots with Whisker, + Sign and Outliers
On the basis of data of the example discussed in sub-section 16.4.1 we see that
girls generally dropped the balls from one box to another faster than boys. We
also see that one boy was slower than almost all of the women (except 3). Fig.
16.8 shows the box plot for the girl’s data with whisker, + sign and outliers.

Fig. 16.8: A Box-Plot for the Girl’s Data with Whisker, + Sign and Outliers

Note 2: If some learner is interested to know more about the topics discussed
in Secs. 16.4 and 16.5 he/she may refer chapters 6 and 7 of the book written at
serial number 5 in the reference books listed below the introduction of MST-
001 on page number 4 of block 1.
Now, you can try the following exercises.
73
Presentation of Data E6) Draw a box plot for the given data:
17, 15, 17, 20, 13, 15, 15, 16, 16, 15, 19, 12, 19, 14, 11, 14, 16, 10, 19, 18,
20, 14, 17, 19, 16, 22, 21, 23, 14, 12, 18, 13, 12, 25, 14, 15, 31, 17, 10, 21
E7) Draw a box plot for the given data:
31, 42, 22, 27, 33, 27, 37, 28, 34, 44, 25, 39, 26, 31, 26, 33, 46, 48, 50

16.7 SUMMARY
In this unit, we have discussed:

1) Time series data and methods of drawing a time series graph.

2) How to draw a range chart and band chart.

3) Methods of drawing a stem-and-leaf displays.

4) The box plots and the different components of the box plot.

16.8 SOLUTIONS / ANSWERS


E1) Range chart of the given data is given below.

E2) Required band graph of the given data is given below.

74
E3) Stem-and-leaf display of the given data of marks obtained by 30 students Graphical Presentation of
is given below. Data - II

5 970
6 85121947255
7 700369917
8 0270230
After arranging the leaves in ascending order of magnitude, we have
5 079
6 11224555789
7 001367799
8 0002237
Median = second quartile = Q 2 / 4 :
 j  n  1  2  30  1
i      15.5
 m  2  4  2
 median = x 15.5 = 15th value in the array + 0.5(16th value  15th value)
= 70 + 0.5(70  70) = 70
E4) Stem-and-leaf display of the given data is given below.
2 276
3 135
4 246
5 789
6 74513
After arranging the leaves in ascending order of magnitude, we have
2 267
3 135
4 246
5 789
6 13457
D 7 = seventh decile = Q 7 / 10 :
 j  n  1  7  17  1
i      12.4
 m  2  10  2
 D 7 = x 12.4 = 12th value in the array + 0.4(13th value  12th value)
= 59 + 0.4(61  59) = 59.8
E5) Stem-and-leaf display of the given data is given below.
10 579
11 07302
12 50
13 79520
14 147
After arranging the leaves in ascending order of magnitude, we have
10 579
11 00237
12 05
13 02579
14 147

75
Presentation of Data P67 = sixty seventh percentile = Q 67 / 100 :
 j  n  1  67  18  1
i      12.56
 m  2  100  2
 P67 = x 12.56 = 12th value in the array + 0.56(13th value  12th value)
= 132 + 0.56(135  132) = 133.68
E6) After arranging the given data in ascending order of magnitude, we have
10, 10, 11, 11, 12, 12, 12, 13, 13, 14,14,14,14, 15, 15, 15, 15, 15, 16, 16,
16, 16, 17, 17, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 21, 21, 22, 23, 25,
25
The lowest or smallest observation = x s = 10
th
 40  1 
First quartile Q1     item
 4 
= 10.25th item = 10th item + 0.25 (11th item –10th item)
= 14 + 0.25 (14 – 14) = 14
th
 40  1 
Second quartile Q 2   2  item
 4 

= 20.5th item = Mean of 20th and 21st items

16  16
= = 16
2
th
 16  1 
Third quartile Q 3   3   item
 4 

= 30.75th item = 30th item + 0.75 (31st item – 30th item)

= 19 + 0.75 (19 – 19) = 19

The largest observation = x l = 25


Using above calculations box plot based on five-number summary (i.e.
smallest observation x s , first quartile ( Q1 ), second quartile ( Q 2 ), third
quartile Q 3 ), largest observation x l ) is given below:

76
E7) After arranging the given data in ascending order of magnitude, we have Graphical Presentation of
Data - II
22, 25, 26, 26, 27, 27, 28, 31, 31, 33, 33, 34, 37, 39, 42, 44, 46, 48, 50

The lowest or smallest observation = x s = 22


th
 19  1 
First quartile Q1    th
 item = 5 item = 27
 4 
th
 19  1 
Second quartile Q 2   2 th
 item = 10 item = 33
 4 
th
 19  1 
Third quartile Q 3   3   item = 15 th item = 42
 4 

The largest observation = x l = 50

Using above calculations box plot based on five-number summary (i.e.


smallest observation x s , first quartile ( Q 1 ), second quartile ( Q 2 ), third
quartile Q 3 ), largest observation x l ) is given below:

77

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy