MST 001 Block 4 WWW - Khoji.net 18428
MST 001 Block 4 WWW - Khoji.net 18428
MST 001 Block 4 WWW - Khoji.net 18428
-
Foundation in
Indira Gandhi National Open University
School of Sciences Mathematics and
Statistics
Block
4
PRESENTATION OF DATA
UNIT 13
Classification and Tabulation of Data 5
UNIT 14
Diagrammatic Presentation of Data 23
UNIT 15
Graphical Presentation of Data-I 47
UNIT 16
Graphical Presentation of Data-II 61
Curriculum and Course Design Committee
Prof. K.R. Srivathsan Prof. Rahul Roy
Pro-Vice Chancellor Maths and Stat. Unit
IGNOU, New Delhi Indian Statistical Institute, New Delhi
Block Production
Mr. Y. N. Sharma, SO (P), School of Sciences, IGNOU
CRC prepared by Mr. Rajesh Kaliraman, SOS, IGNOU and Ms. Preeti
Acknowledgement: We gratefully acknowledge Prof. Geeta Kaicker, Director, School of Sciences and
Prof. Parvin Sinclair, Director, NCERT for reading the course material and providing their valuable
suggestions to improve the Course.
March, 2012
© Indira Gandhi National Open University, 2012
ISBN – 978-81-266-5973-9
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other
means, without permission in writing from the Indira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from the
University’s office at Maidan Garhi, New Delhi-110 068.
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi by Director,
School of Sciences.
Printed at: Gita Offset Printers Pvt. Ltd., C-90, Okhla Indl. Area-I, New Delhi-20
BLOCK 4 PRESENTATION OF DATA
In previous block, we have become familiar with origin, development,
definition, importance of statistics and its applications in different areas. We
have also discussed the collection of data and preparation of questionnaires to
collect information. After collecting the information in term of data, we may
like to arrange the collected data in a proper manner because the collected data
may be huge in volume. So, here the need for proper arrangement of data
arises. In statistical terminology, the proper arrangement of data is known as
presentation of data. In this block, we shall try to learn some basic tools to
represent the collected data. There are some frequently used tools available for
representing data and they may be classified in three basic forms statistical
table, diagrams and graphs. This block is devoted to discuss these things. The
flow of the block is maintained by the following four units.
Unit 13: After collection of data next step is classification followed by
tabulation of data. Unit 13 is devoted to discuss what we mean by classification
and tabulation of data.
Unit 14: A pictorial presentation of the tabulated data may be done either with
the helps of different kinds of diagrams or by graphs. This unit discusses about
some commonly used diagrams, while Unit 15 and Unit 16 are devoted to
discuss different types of graphical presentation of the data. That is graphs for
frequency distributions, graphs for time series data, stem-and-leaf displays and
box plots are discussed in Unit 15 and Unit 16.
Notations and Symbols
f : frequency
N f : total of all frequencies
C. I. : class interval
C : degree Celsius
UNIT 13 CLASSIFICATION OF DATA Classification and Tabulation
of Data
Structure
13.1 Introduction
Objectives
13.2 Classification of Data
13.3 Tabulation of Data
13.4 Summary
13.5 Solutions/Answers
13.1 INTRODUCTION
In Unit 12 of Block-3 of this course, we have discussed some methods of data
collection whether the target population from where the information collected
was small or large. After collection of data, next step is to classify the data in
such a manner that it becomes ready for proper presentation.
The need for proper presentation arises because of the fact that statistical data in
their raw form are almost defy comprehension. When data are presented in
easy-to-read form, it can help the reader to acquire knowledge in much shorter
period of time and also facilitate statistical analysis.
A statistical table is a presentation of numbers in a logical arrangement, with
some brief explanation to show what they are. However, before tabulating data,
it is often necessary to first classify them. So, the concept of classification is
described in Sec. 13.2 of the unit and that of tabulation is discussed in Sec. 13.3.
Objectives
After studying this unit, you should be able to:
classify a data set according to the nature of the data;
construct a discrete frequency distribution for a discrete type of data;
construct a continuous frequency distribution for a continuous type of data;
classify the collected data according to the class intervals; and
arrange the data into a suitable form of a table.
6
Time series data are usually listed in chronological order, normally in Classification and Tabulation
ascending order of time, like 2001, 2002,… .When the major emphasis falls on of Data
the most recent events, a reverse time order may be used.
(iii) Quantitative Classification
Quantitative classification refers to the classification of data according to some
characteristics that can be measured numerically such as height, weight,
income, age, sales, etc. For example, the employees of an institute may be
classified according to their pay scales as follows:
Table-13.3: Quantitative Classification of 840 Employees According to their Pay Scales
Scale of Pay Number of Employees
9300 - 34800 467
15600 - 39100 215
37400 - 67000 158
Total 840
Population
Men Women
8
A bar (|) called tally mark is put against the number when it occurs. After Classification and Tabulation
putting this mark four times against the value, a cross tally is put on these 4 of Data
tallies for the fifth mark as shown in the above table. From the sixth mark
onwards, we start afresh in the similar manner. This technique facilitates easy
counting of the tally marks at the end. The presentation of the data as given in
Table 13.4 is known as frequency distribution.
A frequency distribution refers to the data which are classified on the basis of
some variables that can be measured such as wages, age of children, etc. A
variable refers to the characteristic that varies in magnitude in a frequency
distribution. It may be either discrete or continuous. A discrete variable is that
which generally takes integer values. For example, the number of students, the
number of books, etc. A continuous variable can take integer or fractional
values within the range of possibilities, such as the height or weight of
individuals. Generally speaking, continuous data are obtained through
measurements while discrete data are derived by counting. A series described
by a continuous variable is called continuous series. Similarly, series
represented by a discrete variable is called discrete series.
According to the nature of the variable, the frequency distribution may be of
two types, i.e. discrete frequency distribution and continuous frequency
distribution. Let us discuss them one by one.
9
Presentation of Data Table 13.6: Frequency Distribution of Heights of 50 Persons
Heights (cm) Tally Mark Frequency
120 -130 ||| 3
130 -140 |||| 5
140- 150 |||| |||| 10
150 -160 |||| |||| |||| 14
160 -170 |||| |||| || 12
170-180 |||| 4
180-190 || 2
Total 50
After discussing the discrete and continuous frequency distributions let us
discuss the Relative and Cumulative frequency distributions which are of the
similar importance as analysis point of view of data is considered.
Relative Frequency Distribution
A relative frequency corresponding to a class is the ratio of the frequency of
that class to the total frequency. The corresponding frequency distribution is
called relative frequency distribution. If we multiply each relative frequency by
100, we get the percentage frequency corresponding to that class and the
corresponding frequency distribution is called “Percentage frequency
distribution”. Let us take an example in which both relative and percentage
frequency distributions are prepared.
Example 1: A frequency distribution of marks of 50 students in a subject is as
given below:
Class (Marks): 0-10 10-20 20-30 30-40 40-50
Frequency: 6 10 14 18 2
Prepare relative and percentage frequency distributions.
Solution: The relative and percentage frequency distributions can be formed as
given in the following table:
Class (Marks) Frequency (f) Relative Percentage Frequency
X frequency (f/N) (f/N) 100
0-10 6 6/50 = 0.12 0.12 100 = 12 %
10-20 10 10/50 = 0.20 0.20 100 = 20 %
20-30 14 14/50 = 0.28 0.28 100 = 28 %
30-40 18 18/50 = 0.36 0.36 100 = 36 %
40-50 2 2/50 = 0.04 0.04 100 = 4 %
Total 1.00 100
f N 50
Cumulative Frequency Distribution
The cumulative frequency of a class is the total of all the frequencies up to and
including that class. A cumulative frequency distribution is a frequency
distribution which shows the observations ‘less than’ or ‘more than’ a specific
value of the variable.
The number of observations less than the upper class limit of a given class is
called the less than cumulative frequency and the corresponding cumulative
frequency distribution is called less than cumulative frequency distribution.
10
Similarly, the number of observations corresponding to the value of more than Classification and Tabulation
the lower class limit of a given class is called more than cumulative frequency of Data
and the corresponding cumulative frequency distribution is called ‘more than’
cumulative frequency distribution. Following is an example, wherein ‘less
than’ and ‘more than’ cumulative frequency distributions have been obtained.
Example 2: For the following frequency distribution of marks of 50 students in
a subject, form both types of cumulative frequency distributions.
Class (Marks) 0-10 10-20 20-30 30-40 40-50
No. of Students 7 11 15 12 5
12
following example there are 24 students who have secured the marks between Classification and Tabulation
0 and 50. A student who secured 20 marks would be included in class 20-30, of Data
not in 10–20. This method is widely followed in practice.
Example 3: 24 students appeared in an entrance test where all questions are
objective type with 25% –ve marking. The marks obtained out of 50 maximum
marks are as follows:
17, 16, 7, 30, 21, 42, 44, 36, 22, 22, 25, 31, 31, 34, 30, 36, 35, 45, 25, 15,
20, 42, 40, 30
Prepare a frequency distribution by using exclusive method.
Solution: Frequency distribution of marks obtained by above 24 students is
given below in table 13.8 using exclusive method as follows:
Table 13.8: Frequency Distribution of 24 Students by Exclusive Method
Classes Tally No. of Students
bar
0-10 | 1
10-20 ||| 3
20-30 |||| | 6
30-40 |||| |||| 9
40-50 |||| 5
Total 24
Inclusive Method
Under the inclusive method of classification both lower class limit as well as
the upper limit of a class is included in that class itself. Following frequency
distribution is formed using inclusive method for the data of Example 3 given
above.
Table 13.9: Frequency Distribution of 24 Students by Inclusive Method
Class Tally bar No. of
Students
0-9 | 1
10-19 ||| 3
20-29 |||| | 6
30-39 |||| |||| 9
40-49 |||| 5
Total 24
That means if data are classified in such a way that the lower as well as the
upper class limits are included in the same class interval, it is called inclusive
class interval.
For converting data from inclusive form to exclusive form, first of all we find
the half of the difference of lower limit of that class and upper limit of the
preceding class. This value is then subtracted from lower limit of each class
and added to the upper limit of each class. In the above example, this can be
easily understood as (10–9)/2 = 0.5. So, the class intervals are as – 0.5- 9.5,
9.5-19.5, … , 39.5-49.5. If all the observations of data are positive then the
lower limit of first class can be taken 0. Therefore, in this case the class
intervals are as 0-9.5, 9.5-19.5, …, 39.5-49.5.
13
Presentation of Data Remark
(i) Lower limit of a class interval is always included in the class in both the
method discussed above.
(ii) In exclusive method upper limit of a class is not included in the class.
That is why the name exclusive.
(iii) In inclusive method upper limit of a class is also included in the class.
That is why the name inclusive.
14
Table 13.10: Frequency Distribution of 24 Persons by Inclusive Method Classification and Tabulation
Classes No. of of Data
Students
– 0.5-9.5 01
9.5-19.5 03
19.5-29.5 06
29.5-39.5 09
39.5-49.5 05
Total 24
(5) The intervals of all the classes should be of the same size, because if the
class intervals are not of the same width, it is difficult to make
meaningful comparison between classes. Sometimes the data may require
the inclusion of so many class intervals that the frequency distribution
will become large. Then the classification may be done as follows:
below 10
10-20
20-30
30-40
above 40
These classes are called open end classes and the distribution is known
as open end frequency distribution.
It may be noted that the frequency distributions, like other types of data
presentation, are always constructed to serve some specific purpose. The
technical requirements outlined above must be supplemented by sound
subjective judgments if proper frequency distributions are to be formed.
After learning so much about classification of data, you have got/realised the
importance of classification. So before move to next section, let us just
highlight/outline some of the main points related to the importance of
classification:
It is preliminary for further statistical analysis,
It facilitates comparison and make conclusion easy,
It facilitates tabulation.
Now, you can try the following exercises.
E5) The marks of 30 students in statistics are given below:
10, 12, 25, 32, 27, 32, 38, 43, 39, 55, 29, 38, 57, 08, 06, 13, 27, 25, 29, 53,
55, 45, 35, 48, 47, 59, 15, 19, 48, 55
Classify the above data by taking a suitable class interval.
E6) Present the following data of the profits (in crores of Rs.) of the 60
companies in the years 2009-10:
41, 17, 83, 63, 55, 92, 60, 58, 70, 06, 67, 82, 33, 44, 57, 49, 34, 73, 54, 63,
36, 52, 32, 75, 60, 33, 09, 79, 28, 30, 42, 93, 43, 80, 03, 32, 57, 67, 84, 64,
63, 11, 35, 28, 10, 23, 08, 41, 60, 32, 72, 53, 92, 88, 62, 55, 60, 33, 40, 57
Classify data by inclusive method.
E7) Use the data given in the E6 to present the same using principle of adding
and subtracting the correction factor.
15
Presentation of Data 13.3 TABULATION OF DATA
One of the simplest and most revealing devices for summarising and presenting
data in a meaningful arrangement is statistical table. We can also define a
statistical table as the logical listing of quantitative data in columns and rows of
numbers with sufficient explanatory statements. The statements may be given
in the form of titles, headings and notes to make clear the full meaning of data
and their origin.
In other words, a table is a systematic arrangement of statistical data in
columns and rows. Rows are horizontal arrangements, whereas columns are
vertical ones. A table can solve the purpose of the presentation and facilitate
comparison. The simplification results from the clear-cut and systematic
arrangement, which enables the reader to quickly locate the desired
information. Comparison is facilitated by brining related items of information
close together.
13.3.1 Components of a Table
The various components of a table may vary case to case depending upon the
given data. But a good table must contain at least the following components:
1. Table Number
2. Table Heading
3. Caption
4. Stub
5. Body of Table
6. Head Note
7. Foot Note
Let us throw some light on these components one by one:
1. Table Number
A statistical table should be numbered. There are different ways with regard to
the place where table number is to be given. The table number may be shown
either in the centre at the top above the title or in the left hand side of the table
at the top. When there are many columns, it is desirable to number each
column so that easy reference to it is possible.
2. Table Heading
A good table should have a suitable heading. The heading is a brief description
of the contents of the table. It should be placed above the table. It should
answer the following questions:
(a) What categories of statistical data are shown?
(b) Where the data occurred?
(c) When the data occurred?
In other words the heading of the table should be clear, brief and self-
explanatory, but some times long title may have to be used for the sake of
clarity. The title should be so worded that it permits one and only one
interpretation.
3. Caption
Caption refers to the column heading, and explains what information column
presents. It may consist of one or more column headings, i.e. under a column
16
heading there may be two or more sub headings. The caption should be clearly Classification and Tabulation
defined and placed at the middle of the column. If the different columns are of Data
expressed in different units, the unit should be specified along with the
captions.
4. Stub
The stubs are row headings. They are placed at the extreme left of the table and
perform the same function for the horizontal rows in the table as the captoins
do for the vertical columns.
5. Body
The body of the table is the central part of table that contains the numerical
information presented in table. This is the most vital part of the table.
6. Head Note
Head note is a brief explanatory statement applying to all or a major part of the
material presented in the table and is placed below the title entered and
enclosed in brackets. It is used to explain certain points relating to the whole
table that have not been included in the title nor in the captions or stubs. For
example, the unit of measurement is frequently written as the head note such as
“in thousands” or “million tons” or “in crores”, etc.
7. Foot Note
Anything in a table which the reader supposed to find difficult to understand
should be explained in footnotes. Footnotes may be placed directly below the
body of the table. The footnotes are generally used for the following purposes:
(a) Any special circumstances affecting the data, for example, strike, fire,
etc.
(b) To clarify any thing in the table.
(c) To give the source in case of the secondary data. If any information in the
table obtained from some journal, its name, date of publication, page
number, table number, etc. should be mentioned so that if the user wishes
to check the data from the original source, he could know where to look
for the information.
After discussing the parts of a table, let us discuss different kinds of tables,
through which we can represent or arrange the different types of informations.
13.3.2 Types of Tables
Tables may broadly be classified into following two categories.
1. Simple and Complex Tables
2. General Purpose and Special Purpose Tables
1. Simple and Complex Tables
The simple and complex tables can be differentiated on the basis of number of
characteristics presented and studied. If the data based on one characteristic is
presented, the table is known as simple table. The simple table is also known as
one way table. On the other hand, in a complex table, two or more
characteristics are presented. The complex tables are frequently used in
practice because they facilitate to incorporate full information and a proper
consideration of all related facts. If the data are tabulated on the basis of only
two characteristics then the table is known as two way table. If three
17
Presentation of Data characteristics are arranged in a table then the table is known as treble table.
When four or more characteristics are simultaneously presented it is known as
manifold tabulation.
The following table presenting the distribution of marks obtained by 100
students in a test is an illustration of a simple table:
Table-13.11: Distribution of Marks Obtained by 100 Students in Statistics
Marks No. of Students
Below 10 5
10-20 8
20-30 12
30-40 10
40-50 15
50-60 18
60-70 17
70-80 13
Above 80 02
Total 100
18
2. General Purpose and Special Purpose Tables Classification and Tabulation
of Data
General purpose tables, also known as reference tables or repository tables, and
provide the information for general use or reference. They usually contain
detailed information and are not used for specific discussion. In other words,
these tables serve as a repository of information and are arranged for easy
reference such as the tables published by government agencies, the tables
contained in the statistical abstract of the Indian Union, tables in the census
reports, etc.
The general tables tell facts which are not for particular discussion. If general
tables are used by a researcher, they are usually placed in the form of appendix
at the end of the report for easy reference.
13.4 SUMMARY
In this unit we have covered the concepts of classification and tabulation of
data. That is we have discussed:
19
Presentation of Data 13.5 SOLUTIONS/ANSWERS
E1) The classification of the data for the production of wheat according to the
given cities can be done in the following way:
Table 13.13: Geographical Classification of the Production of Wheat
Region Production of Wheat
( in .000 kg.)
Agra 376
Bhopal 230
Chandigarh 583
Mumbai 136
E2) Classification of the profits of a company from 2001 to 2010 can be done
in the following way:
Table 13.14: Chronological Classification of Profits from 2001 to 2010
Year Profits Year Profits
(in crores of (in crores of
rupees) rupees)
2001 10 2006 16
2002 15 2007 17
2003 13 2008 21
2004 17 2009 20
2005 12 2010 18
E4) The continuous frequency distribution for the given information can be
constructed in the following way:
Table 13.16: Continuous Frequency Distribution of 50 Students According to their
Heights
20
E5) Let us determine the suitable class interval with the help of the following Classification and Tabulation
formula: of Data
Range
i
1 3.322 Log N
Range = 59 06 = 53, N = 30
53 53
i 8.97 9
1 3.322 Log 30 1 4.91
Since values like 3, 7, 9 etc., should be avoided and therefore, we will take
10 as the class interval and hence let us take the first class as 5-15 and thus
the following table is formed:
Table 13.17: Continuous Frequency Distribution of 30 Students According to their
Heights
Heights Tally Mark Frequency
(cm)
05-15 |||| 5
15-25 || 2
25-35 |||| ||| 8
35-45 |||| 5
45-55 |||| 5
55-65 |||| 5
Total 30
E6) As the least value is 3 and the highest value is 93, so using
Range 93 3
i 13.03 13
1 3.322 Log N 1 3.322 Log 60
since, values like 3, 7, 9, 11, 13 etc., should be avoided and therefore, we
will take 14 as class interval and hence let us take the first class as 0-14
and thus the following table is formed.
Table 13.18: Continuous Frequency Distribution of 60 Students According to their
Heights
E7) Table 13.19 given on next page illustrates the way of classification of
data according to the exclusive method and principle of correction factor
in classification.
21
Presentation of Data Table 13.19: Continuous Frequency Distribution of 60 Students
According to their Heights
Heights (cm) Tally Mark Frequency
0.5-14.5 |||| | 06
14.5-29.5 |||| 04
29.5-44.5 |||| |||| |||| | 16
44.5-59.5 |||| |||| 10
59.5-74.5 |||| |||| |||| 14
74.5- 04.5 |||| || 07
89.5-94.5 ||| 03
Total 60
E8) The following table is the representation of the data for the given
information’s regarding the drinkers in city X and city Y.
Table13.20: Presentation of Data regarding the Drinkers in City X and City Y in
the form of Two Way Table
22
UNIT 14 DIAGRAMMATIC PRESENTATION Diagrammatic Presentation
of Data
OF DATA
Structure
14.1 Introduction
Objectives
14.2 Diagrammatic Presentation
14.3 One Dimensional or Bar Diagrams
14.4 Two Dimensional Diagrams
14.5 Pie Diagrams
14.6 Pictogram
14.7 Cartogram
14.8 Choice of a Suitable Diagram
14.9 Summary
14.10 Solutions/Answers
14.1 INTRODUCTION
In Unit 13, we have discussed about the classification and tabulation of data.
Though these methods are very helpful to make easy and systematic
presentation of the data, even then people are least interested in tables. A group
of large number of observations always makes misperception to the reader and
he/she may understand it wrongly. If data are presented in the form of
diagrams, it attracts the reader and he/she tries to understand it. Diagrammatic
presentation helps in quick understanding of data. Confirmation of this can be
found in the financial pages of news papers, journals, advertisement, etc. There
are many methods of representing the numerical figures through diagrams but
sometimes, it is very difficult to decide that which is the best diagram in a
specific situation?
In this unit we will discuss one-dimensional, two dimensional and pie
diagrams. Pictogram and Cartogram have been also discussed in this unit. Unit
ends with a note on choice of a suitable diagram in a given situation.
Objectives
After studying this unit, you should be able to:
become familiar with the diagrammatic presentation of data;
draw suitable bar diagrams for given data;
draw rectangle and square diagrams for the given data;
draw pie diagram;
draw pictograms and cartograms; and
select an appropriate diagram to represent data.
24
too big nor too small. Similar scale is necessary for comparison of diagrams. Diagrammatic Presentation
Scale should be mentioned clearly at the top of the diagram or below it. of Data
(iv) Footnotes
To clarify certain points about the diagram, footnotes are to be used.
Footnotes may be given at the bottom of the diagram.
(v) Index of Diagram
An Index should be given to illustrate different types of lines or different
types of shades or colours, so that the reader can easily make out the meaning
of the diagram.
(vi) Neat and Clean Diagram
A good diagram should be absolutely neat and clean. Too many information
should not be given in one diagram otherwise reader may get confused.
(vii) Simple Diagram
A good diagram should be as simple as possible so that the reader can
understand its meaning clearly, otherwise the complexity can omit its main
theme.
In previous two subsections we have explained the significance and general
rules for construction of diagrams. In next subsection we will just list the types
of the diagrams. Then in subsequent sections we will discuss each type of
diagrams in detail.
14.2.3 Types of Diagrams
In practice, various types of diagrams are in use and new ones are constantly
being added. For the sake of application and simplicity several types of
diagrams are categorised under the following heads:
(i) One Dimensional Diagrams or Bar Diagrams
(ii) Two Dimensional Diagrams
(iii) Pie Diagrams
(iv) Pictogram
(v) Cartogram
25
Presentation of Data After looking the merits of bar diagrams, you will be keen to know how
bar diagrams are constructed and how many types of bar diagrams are
generally used. Coming two subsections will address the above two
points/questions.
14.3.1 Types of Bar Diagrams
The following are the different types of bar diagrams:
(i) Simple Bar Diagram
(ii) Subdivided Bar Diagram
(iii) Multiple Bar Diagram
(iv) Percentage Bar Diagram
(v) Deviation Bar Diagram
(vi) Broken Bar Diagram
Let us discuss these types of bar diagrams one by one.
(i) Simple Bar Diagram
If someone has to represent the data based on one variable, then the simple bar
diagram can be used. For example, the figures of productions, profits, sales,
etc. for various years may be represented by the help of simple bar diagrams.
From simple bar diagrams reader can easily see the variation in the
characteristic under study with respect to time or some other given factor,
because width of each bar is same and only lengths of the bars vary. In our
representation we will take length of bars along vertical axis and other given
factor along horizontal axis. They are very popular in practice. For example,
while presenting the total turnover of a company for last five decades, one can
only depicts the total turnover amount in the simple bar diagrams. Let us
construct a simple bar diagram in the following example.
Example 1: The profit (in Rs crore) of a company from 1990-91 to 1999-
2000 are given below:
Year Profit (in Rs crore) Year Profit (in Rs crore)
1990-91 35.6 1995-96 87.2
1991-92 46.7 1996-97 113.1
1992-93 39.8 1997-98 123.6
1993-94 68.2 1998-99 119.7
1994-95 93.5 99-2000 130.8
26
(ii) Subdivided Bar Diagram Diagrammatic Presentation
of Data
If various components of a variable are to be represented in a single diagram
then subdivided bar diagrams are made in this situation. For example, a
number of members of teaching staff in various departments of an institute
may be represented by a subdivided bar diagram. Each bar is divided into the
number of components in this diagram. First of all the cumulative or total
amount is calculated from the amounts of components. Then bar is divided
with respect to the magnitude of the components. The length of the bar is equal
to the total of the amounts of the components.
A bar is represented in the order of magnitude from the largest component at
the base of the bar to the smallest at the end of the bar, but the order of various
components in each bar is kept in the same order. Different shades or colours
are used to distinguish between different components. To explain such
differences, the index should be used in the bar diagram.
Subdivided bar diagrams can be represented vertically or horizontally. If the
number of components are more than 10 or 12, the subdivided bar diagrams
are not used because in that case, the diagram would be over loaded with
information and cannot easily be compared and understood. Let us see how
subdivided bar diagram is constructed with the help of the following example:
Example 2: Represent the following data by subdivided bar diagram:
Category Cost per chair (in Rs) year wise
1990 1995 2000
Cost of Raw Material 15 20 30
Labour Cost 15 18 25
Polish 5 6 15
Delivery 5 6 10
Total 40 50 80
Solution: First of all we calculate the cumulative cost on the basis of the given
amounts:
Category 1990 1995 2000
Cost Cumulative Cost Cumulative Cost Cumulative
(in Rs) Cost (in Rs) (in Rs) Cost (in Rs) (in Rs) Cost (in Rs)
Cost of RM 15 15 20 20 30 30
L Cost 15 30 18 38 25 55
Polish cost 5 35 6 44 15 70
Delivery 5 40 6 50 10 80
Total 40 50 80
On the basis of above table required subdivided bar diagram is given below:
27
Presentation of Data (iii) Multiple Bar Diagram
In multiple bar diagram, we construct two or more than two bars together. The
multiple bars are constructed for either the different components of the total or
for the magnitudes of the variables. All the bars of one group of data are made
together so that the comparison of the bars of different groups can be done
properly. The height of the bars will be magnitude of the component to be
presented as similar as we do in simple bar diagram. In this diagram the space
between the vertical axis and the first bar of the first group of bars is left but no
space is left between the bars of the same group. There must also be left the
space between the bars of the two different groups of bars.
In multiple bar diagrams two or more groups of interrelated data are presented.
The technique of drawing such type of diagrams is the same as that of simple
bar diagram. The only difference is that since more than one components are
represented in each group, so different shades, colours, dots or crossing are
used to distinguish between the bars of the same group, and same symbols are
used for the corresponding components of the other groups. The multiple bar
diagrams are very useful in situations of either the number of relative
components are large or the change in the values of the components of one
variable is important. Following example will illustrate how a multiple bar
diagram is drawn for given data.
Example 3: Draw the multiple bar diagram for the following data.
Sale Gross profit Net profit
Year
(in ,000 Rs) (in ‘000 Rs) (in, ‘000 Rs)
1990 100 30 10
1995 120 40 15
2000 130 45 25
2005 150 50 30
2010 200 70 30
Solution: Multiple bar diagram for the above data is given below.
Solution: The sale of the cars in year 2000 is almost 14 times that of in year
1950. In order to gain space for the sale figure in the year 1950, we have to use
broken bar to represent the sale of cars for year 2000. Subdivided bar diagram
for the given data is shown below.
30
14.3.1 Principles of Construction of Bar Diagrams Diagrammatic Presentation
of Data
(i) The width of each bar must be uniform in a diagram.
(ii) The gap between two bars should be uniform throughout the diagram.
(iii) Bars may be either horizontal or vertical. The vertical bars should be
preferred because they give a better look than horizontal bars and also
facilitate comparison. We will use vertical bars in our presentation.
(iv) The respective figures should also be written at the top of bars so that the
reader may able to know the precise value without looking at the scale.
Now, you can try the following exercises:
E1) Represent the following data by a suitable diagram:
Years: 2005 2006 2007 2008 2009 2010
Enrollment
of the students: 280 294 302 270 325 406
E2) Represent the following data by a suitable bar diagram:
Year: 2007-08 2008-09 2009-10
Gross Income: 440 480 520
Gross Expenditure: 410 440 490
Net Income: 160 180 175
Tax: 180 165 190
E3) Represent the following information by a suitable diagram:
Class Average marks Average Marks Average Marks
in Mathematics in Statistics in Physics
A 58 70 65
B 62 68 72
E4) Draw a suitable diagram for given expenditure data of two families.
Item Family A Family B
Food 300 350
Clothing 250 200
Education 280 300
Others 220 200
E5) Draw a suitable diagram to represent the following information:
Item Company A Company B
Selling Price 9500 8000
Raw Material 5500 6500
Direct Wages 3500 4000
Rent of Office 1500 1500
32
(side) 2 given numerical figure side of square given numerical figure Diagrammatic Presentation
of Data
Remember that the base line would be same for all squares.
In other words, we follow the following steps for the construction of the square
diagram:
Step 1 Take the given numerical observations/figures as areas of the
corresponding squares.
Step 2 Take square roots of the given numerical observations/figures as sides
of the corresponding squares.
Step 3 Construct the corresponding squares like rectangle diagrams.
Let us discuss the method of drawing the square diagram with the help of the
following example:
Example 8: Represent the following data of the number of schools in a city A
from 1970-80 to 2000-10 in a square diagram.
Years 1970-80 1980-90 1990-2000 2000-10
Number of 4 9 36 64
schools in city A
Solution:
Step 1 Areas of the corresponding squares = 4, 9, 36, 64
Step 2 Sides of the corresponding squares = 4 , 9 , 36 , 64 = 2, 3, 6, 8
Step 3 Square diagram for the given data is shown below.
Remark 1: If in some cases given observations are large and so their square
roots, then we can adjust the scale in usual way.
For example, suppose the given observations are 256, 1600, 5184, 9216, then
sides of the squares will be
256 , 1600 , 5184 , 9216 16, 40, 72, 96.
Here we can adjust the scale by taking 16 units = 1 unit, after this, sides of the
squares reduces to 1, 2.5, 4.5, 6. Now using sides of the squares as 1, 2.5, 4.5,
6, we can construct the square diagram as done in above example, provided
we have to mention in the right top most corner the scale used (i.e. 16 units =
1 unit along both axes).
33
Presentation of Data (iii) Circles Diagram
Another form of preparing the two dimensional diagram is circle diagram. As
in square diagram we took given numerical figures/observations as the areas of
the corresponding squares. Similarly, here we take given numerical
figures/observations as areas of the corresponding circles. But as we know that
Area (A) of a circle = r 2 , where r is radius of the circle
if y ax, where a is
A r , read as A is proportional to r constant, then we say that
2 2
y is proportional to x.
r 2 Given numerical figures/observations
i.e. r1 , r2 , r3 , r4 4 , 9 , 36 , 64 2, 3, 6, 8
Step 4 Circles diagram for the given data is shown on the next page. Radii of
the circles lie on the dotted line.
34
Diagrammatic Presentation
of Data
Number of 16 25 65 150
colonies in city A
35
Presentation of Data of other head(s) must spread so that total remains 100%. That is why pie chart
gives relationship between whole and its parts.
Steps used for constructing a pie chart.
Step 1 Find the total of different parts.
Step 2 Find the sector angles (in degrees) of each part keeping in mind that
total angle around the centre of a circle is of 360 0.
Step 3 Find the percentage of each part taking the total obtained in step 1 as
100 percent.
Step 4 Draw a circle and divide it into sectors, where each sector (or area of
the sector) of the circle with corresponding angles obtained in step 2
will represent the size of corresponding parts. Diagram thus obtained is
nothing but pie chart fitted to the given data.
Let us explain the procedure with the help of the following example:
Example 10: A company is started by the four persons A, B, C and D and
they distribute the profit or loss between them in proportion of 4 : 3 : 2 : 1 . In
year 2010 company earned a profit of Rs 14400. Represent the shares of their
profits in a pie chart.
Solution: Given ratio is 4 : 3 : 2 : 1
sum of ratios = 4 + 3 + 2 + 1 = 10
Calculation of Degrees and Percentages
1
or 360 36
10
Total 14400 360 100
36
Solution: On the basis of above calculation, pie chart which shows the shares Diagrammatic Presentation
of profit of the four partners is shown on the next page: of Data
10 %
D
Partner A
20% 40 %
C A Partner B
Partner C
Partner D
30%
B
Note:
(i) In drawing the components on the pie diagram it is advised to follow
some logical arrangements, pattern or sequence. For example, according
to size, with largest on top and others in sequence running clock wise.
(ii) Pie chart is used only when
(a) total of the parts make a meaningful whole. For example, total of the
expenditures of a family on different items make a meaningful whole,
but if in a city there are 100 doctors, 40 engineers, 50 milkmen, 80
businessmen then total of these do not make a meaningful whole so
pie chart should not be used here.
(b) observations in different parts are mutually exclusive. For example in
the situation discussed in part (a) a businessman may also be an
engineer so the observations in different parts are not mutually
exclusive.
(c) observations of the different parts are observed at the same time.
We have discussed the method of drawing pie diagram, in this section. Let us
discuss some limitations of the pie diagram.
14.6 PICTOGRAM
Pictograms, also known as picture grams, are very frequently used in
representing statistical data. Pictograms are drawn with the help of pictures.
These diagrams indicate towards the nature of the represented facts.
Pictograms are attractive and easy to comprehend and as such this method is
particularly useful in presenting statistics to the layman.
The picture which is used as symbols to represent the units or values of any
variable or commodity selected carefully. The picture symbol must be self
explanatory in nature. For example, if the increase in number of Airlines
Company is to be shown over a period of time then the appropriate symbol
would be an aeroplane.
The pictograms have the following merits:
(i) The magnitudes of the variables may be known by counting the pictures.
(ii) An illiterate person can also get the information.
(iii) The facts represented in a pictorial form can be remembered longer.
38
Example 11: Draw a pictogram for the data of production of tea (in hundred Diagrammatic Presentation
kg) in a particular area of Assam from year 2006 to 2010. of Data
14.7 CARTOGRAM
Representation of the numerical facts with the help of a map is known as
cartogram. By representing the facts by maps, the impact of the results on
different geographical area may be shown and to be compared also. Maps are
helpful in comparative study of various districts of a state or different states of
a country. For example, the production of wheat in different geographical areas
can also be represented by cartogram. The quantities on the map can be shown
in many ways, such as through shads or colours or by dots or by placing
pictograms in each geographical area or by the appropriate numerical figure in
each geographical area.
Let us take an example to get a look of the cartogram.
Example 11: Density per square kilometer in different states and union
territories in India according 2011 census data is given below.
State/Union Density State/Union Density State/Union Density
Territory (per sq. Territory (per sq. Territory (per sq.
km. km. km.
Andhra P 308 Kerala 859 Tripura 350
Arunachal P 17 Madya P 236 Uttarakhand 189
Assam 397 Maharashtra 365 Uttar P 828
39
Presentation of Data Bihar 1102 Manipur 122 West Bengal 1029
Chhattisgarh 189 Meghalaya 132 Andaman and N I 46
Goa 394 Mizoram 52 Chandigarh 9252
Gujarat 308 Nagaland 119 Dadar and N H 698
Haryana 573 Orissa 269 Daman and Diu 2169
Himachal P 123 Punjab 550 Delhi 11297
J and K 124 Rajasthan 201 Lakshadeep 2013
Jharkhand 414 Sikkim 86 Pondicherry 2598
Karnataka 319 Tamil Nadu 555
Represent the above data with the help of cartogram.
Solution: Cartogram for the above data is given below:
14.9 SUMMARY
This unit covered the diagrammatic presentation of the data. In this unit, we
have discussed:
1) One dimensional diagrammatic presentation of the data.
2) How to draw different types of bar diagrams.
3) How to draw two dimensional diagrams to represent the given data.
4) How to draw Pie diagram.
5) How to draw Pictograms and Cartograms for the pictorial representations.
6) The selection of an appropriate diagram to represent the data of a given
situation.
14.10 SOLUTIONS/ANSWERS
E1) The suitable diagram in this case is simple bar diagram which is shown
on the next page:
41
Presentation of Data
Enrollments of the students (in Numbers)
450
406
400
350 325
294 302
300 280 270
250
200 Enrollments of
150 the students (in
Numbers)
100
50
0
2005 2006 2007 2008 2009 2010
E2) The suitable diagram in this case is multiple bar diagram which is
shown as follows:
E3) The suitable diagram in this case is multiple bar diagram which is
shown as follows:
42
E4) The suitable diagram in this case is subdivided bar diagram which is Diagrammatic Presentation
shown as follows: of Data
E5) The suitable diagram in this case is percentage bar diagram. So first of
all we have to calculate percentage and cumulative percentage for both
the companies in various categories as given below:
Category Company A Company B
Cost % Cumulative Cost % Cumulative
Cost % Cost Cost % Cost
Selling 9500 47.5 47.5 8000 40 40
price
RM 5500 27.5 75 6500 32.5 72.5
DW 3500 17.5 92.5 4000 20 92.5
ROO 1500 7.5 100 1500 7.5 100
Total 20000 100 20000 100
On the basis of the above calculation subdivided bar diagram is given below:
43
Presentation of Data E6) The suitable diagram in this case is rectangles diagram. The rectangles
for both companies are to be drawn on the following basis.
Company P
Length = 400 (items sold)
Breadth = 20 (rate per item)
Area = 400 20 8000
Company Q
Length = 600 (items sold)
Breadth = 30 (rate per item)
Area = 600 30 = 18000
Therefore, the length and breadth of the two rectangles will be in
proportion of 400 : 600 and 20 : 30 respectively. Now, the areas
calculated for both companies on the bases of their length and breadth
given above, represent the total cost of the companies. These
rectangles are represented below.
E7) Step 1 Areas of the corresponding squares = 16, 25, 65, 150
Step 3 Square diagram for the given data is shown on the next page:
44
Diagrammatic Presentation
of Data
E9) The suitable diagram in this case is pie diagram. Calculation of degrees
and percentages (as we did in Example 10) is an exercise for you. On
the basis of calculation, pie chart which shows the utilization of 100
paise of income by XYZ company in year 2009-2010 is shown on the
next page:
45
Presentation of Data
E10) The suitable diagram in this case is pie diagram. Calculation of degrees
and percentages (as we did in Example 10) is an exercise for you. On
the basis of calculation, pie chart which shows the expenditure of a
family on different items is shown below:
E 11) We locate the production of mangoes through the picture of mango for
the different years according to different magnitude of the data (taking
1 mango = 1 tons mangoes)
46
UNIT 15 GRAPHICAL PRESENTATION OF Graphical Presentation of
Data-I
DATA-I
Structure
15.1 Introduction
Objectives
15.2 Graphical Presentation
15.3 Types of Graphs
Histogram
Frequency Polygon
Frequency Curve
Ogive
15.4 Summary
15.5 Solutions/Answers
15.1 INTRODUCTION
An important function of Statistics is to present the complex and huge data in
such a way that they can easily understandable. In previous unit, we have
discussed the diagrammatic presentation of the data where we have become
familiar with some of the most commonly used diagrams. After discussing the
diagrammatic presentation of data, we are now moving towards the graphical
presentation of data. The graphs are plotted for frequency distributions and are
used to interpolate/extrapolate items in a series including locating various
partition values. In this unit, we shall discuss some of the most useful and
commonly used graphs.
The graphical presentation can be divided into two categories
(i) Graphs for frequency distributions.
(ii) Graphs for time series.
In this unit, we will concentrate ourselves to the graphs for frequency
distributions only. In this regard, we would like to discuss the most commonly
used graphs for frequency distributions, i.e. Histograms, Frequency polygon,
Frequency curve and Cumulative frequency curves, or Ogives.
Objectives
After studying this unit, you would be able to:
describe the graphical presentation;
explain the advantages of graphical presentation;
draw the histogram for continuous frequency distribution;
draw the frequency polygon for a frequency distribution;
draw the frequency curves of different shapes; and
draw the cumulative frequency curves.
47
Presentation of Data
15.2 GRAPHICAL PRESENTATION
A graphical presentation is a geometric image of a set of data. Graphical
presentation is done for both frequency distributions and times series. Unlike
diagrams, they are used to locate partition values like median, quartiles, etc, in
particular, and interpolate/extrapolate items in a series, in general. They are
also used to measure absolute as well as relative changes in the data. Another
important feature of graphs is that if a person once sees the graphs, the figure
representing the graphs is kept in his/her brain for a long time. They also help
us in studying cause and affect relationship between two variables. The graph
of a frequency distribution presents the huge data in an interesting and effective
manner and brings to light the salient features of the data at a glance. Before
closing this Sec. let us see some advantages of graphical presentation.
Advantages of Graphical Presentation
The following are some advantages of the graphical presentation:
It simplifies the complexity of data and makes it readily understandable.
It attracts attention of people.
It saves time and efforts to understand the facts.
It makes comparison easy.
A graph describes the relationship between two or more variables.
After going through the advantages of graphical presentation of data, you were
keen to know the commonly used graphs to represent the data and how these
graphs are drawn. Next section will address these issues.
48
Table 15.1 Graphical Presentation of
Data-I
No of peas per pod 1 2 3 4 5 6 7
Frequency (number of pods) 14 23 66 40 26 18 11
Fig. 15.1: Frequency Bar Diagram for the Frequency Distribution of Number of the Peas for 198 Pods.
Note 1: O represents origin and choice of scale used along horizontal and
vertical axes depends upon given data.
Now, we take the case of frequency distribution of a continuous variable.
The following are the most commonly used graphs for continuous frequency
distributions:
(i) Histogram
(ii) Frequency Polygon
(iii) Frequency Curve
(iv) Cumulative Frequency Curve or Ogives
Let us discuss these one by one:
(i) Histogram
In previous example, we have discussed how a graph is drawn for discrete
frequency distribution.
For the continuous frequency distribution, a better way to represent the data
graphically is to use a histogram. A histogram is drawn by constructing
adjacent rectangles over the class intervals such that the length of the
rectangles is proportional to the corresponding class-frequencies.
Histogram is similar to a bar diagram which represents a frequency distribution
with continuous classes. The width of all bars is equal to class interval. Each
rectangle is joined with the other so as to give a continuous picture.
The class-boundaries are located on the horizontal axis. If the class-intervals
are of equal size, the heights of the rectangles will be proportional to the class-
frequencies themselves. If the class-intervals are not of equal size, the heights
of the rectangles will be proportional to the ratios of the frequencies to the
49
Presentation of Data width of the corresponding classes. In other words, the frequencies of the class-
intervals having the least width are written as they are and the frequencies of
other class intervals are written as follows:
Given frequency
The least width … (15.1)
Width of its Class-interval
Let us draw a histogram to the following frequency distribution given below in
the table 15.2
Table 15.2
Class 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
Intervals
Frequency 2 3 13 18 9 7 6 2
Fig. 15.2: Histogram for Frequency Distribution when Class-intervals are of Equal Width.
Now, let us consider the frequency distribution for unequal class intervals as
given in the Table 15.3
Table 15.3
Class 0-10 10-20 20-30 30-40 40-70 70-80 80-100
Frequency 20 32 8 2 60 35 10
As it is a case of unequal class intervals, so we have to adjust the frequencies of
the classes 40-70 and 80-100 by the formula suggested in equation 15.1. These
calculations are shown in table 15.4 given below:
Table 15.4
Class Interval Frequency Width of Heights of the rectangles
(CI) (CI)
0-10 20 10 20
10-20 32 10 32
20-30 8 10 8
30-40 2 10 2
40-70 60 30 (60/30) 10 = 20
70-80 35 10 35
80-100 10 20 (10/20) 10 = 5
50
The histogram for this frequency distribution is shown in Fig. 15.3. Graphical Presentation of
Data-I
Fig. 15.3: Histogram for Frequency Distribution when Class Intervals are of Unequal Width
Note 2: Sometimes, a histogram is also used for the frequency distribution of a
discrete variable. Each value of the discrete variable is regarded as the mid-
point of an interval. But generally, its use is not recommended, because in
discrete case each frequency actually corresponds to a single point and not to
an interval.
Now, you can try the following exercises.
E1) Draw a histogram from the following data
Class Interval: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
Frequency: 3 5 10 14 24 17 14 10 3
E2) Draw a histogram for the following frequency distribution
Wage (Rs): 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
No. of
Workers: 30 70 100 110 140 150 130 100 90 60
51
Presentation of Data
Fig. 15.4: Frequency Polygon for the Frequency Distribution given in Table 15.2.
Note 3: In some cases first class interval does not start from zero. In such
situations we mark a kink on the horizontal axis, which will indicates the
continuity of the scale starting from zero. Let us take an example of this type.
Solution: Frequency polygon for the given data is shown in Fig. 15.5:
52
points (not necessary all points) of the frequency polygon such that Graphical Presentation of
Data-I
(a) Like frequency curve it also starts from the base line (horizontal axis) and
ends at the base line.
(b) Area under frequency curve remains approximately equal to the area under
the frequency polygon.
In other words, let us try to explain the concept theoretically. Suppose we draw
a sample of size n from a large population. Frequency curve is the graph of a
continuous variable. So theoretically continuity of the variable implies that
whatever small class interval we take there will be some observations in that
class interval. That is, in this case there will be large number of line segments
and the frequency polygon tends to coincides with the smooth curve passing
through these points as sample size (n) increases. This smooth curve is known
as frequency curve.
In the following example we have drawn both frequency polygon and
frequency curve to make the idea clear for you.
Example 2: Draw frequency polygon and frequency curve for the following
frequency distribution.
Solution: Frequency polygon and frequency curve for the above data is given
below in Fig. 15.6.
On the next page some important types of frequency curves are given which
are generally obtained in the graphical presentations of frequency distributions.
That is, symmetrical, positively skewed, negatively skewed, J shaped, U
shaped, bimodal and multimodal frequency curves. You note that the shapes of
these curves justify their names.
53
Presentation of Data
Fig. 15.7
54
More Than Ogive: If we plot the points with the lower limits of the classes as Graphical Presentation of
abscissae and the cumulative frequencies corresponding to the values more Data-I
than the lower limits as ordinates and join the points so plotted by line
segments, the curve thus obtained is nothing but known as “more than
cumulative frequency curve” or “more than ogive”. It is a falling curve.
Let us draw both the ogives (‘less than’ and ‘more than’) for the following
frequency distribution of the weekly wages of number of workers given in
Table 15.5.
Table 15.5
Weekly 0-10 10-20 20-30 30-40 40-50
wages
No. of 45 55 70 40 10
workers
Table 15.6
Weekly No. of Less than Cumulative More than Cumulative
wages workers frequency distribution frequency distribution
Wages Number of Wages Number of
Less than workers More than workers
0-10 45 10 45 0 220
10-20 55 20 100 10 175
20-30 70 30 170 20 120
30-40 40 40 210 30 50
40-50 10 50 220 40 10
From above data, we construct both the ogives as shown in Fig. 15.8 and Fig.
15.9:
55
Presentation of Data
For “less than ogive” as shown on previous page in Fig. 15.8, we have plotted
the points (10, 45), (20,100), (30, 170), (40, 210), (50, 220) and then joined
them by line segments. Similarly, for “more than ogive” as shown above in
Fig. 15.9, we have plotted the points (0, 220), (10, 175), (20, 120), (30, 50),
(40, 10), and then joined them by line segments.
If we want to obtain a partition value, using ogives, we draw dotted horizontal
line through that value at y-axis which corresponds to the partition value and
then from the point, where it meets the less then ogive, we draw a dotted
vertical line and let it meets the x-axis. The abscissa of the point, where it
meets the x-axis is the required partition value. For example, suppose we want
to find first quartile, then we draw a dotted horizontal line starting from y-axis
at a point corresponding to N/4 and let it meets the “less than ogive”. From that
point at “less than ogive”, we draw a dotted vertical line and let it meets the x-
axis. The abscissa corresponding to this point is the first quartile. Similarly, for
finding median or second quartile, we start drawing dotted horizontal line from
y-axis at a point corresponding to N/2 and then we proceed as described above.
Similarly, for third quartile 3N/4 is taken in place of N/2. In this way, we may
find any partition value.
Note 4: Median may also the obtained by drawing dotted vertical line through
the point of inter section of both the ogives, when drawn on a single figure.
Now, you can try the following exercises.
E5) Draw two ogives from the following data
Class: 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90
Frequency: 3 6 10 13 20 18 15 9 6
Hence find median. Compare your result by calculating median by direct
calculatios.
E6) Draw less than ogive from the following frequency distribution of marks
of 90 students
Marks: 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79
No. of Students: 7 11 19 8 20 14 8 3
Hence find Q1, Q2 and Q3.
E7) Draw the more than ogive for the following frequency distribution of the
weekly wages of workers:
Weekly wages: 0-10 10-20 20-40 40-50 50-60 60-70 70-80 80-90 90-100
No. of Workers: 5 15 20 30 45 35 25 15 10
56
15.4 SUMMARY Graphical Presentation of
Data-I
In this unit we have discussed:
1) Various types of graphical presentation of data.
2) Way of drawing histogram for continuous frequency distributions.
3) Frequency polygon for a frequency distribution.
4) Frequency curves of different shapes, and
5) Way of drawing cumulative frequency graphs or ogives.
15.5 SOLUTIONS/ANSWERS
E3) Frequency polygon of the given data by first drawing histogram is given
on the next page.
57
Presentation of Data
30
25
20
15
Y-Values
10
0
0 2 4 6 8 10
E5) Two ogives for the given data are given on the next page.
58
Graphical Presentation of
Data-I
E7) More than ogive for the given frequency distribution is given below:
60
Graphical Presentation of
UNIT 16 GRAPHICAL PRESENTATION Data - II
OF DATA-II
Structure
16.1 Introduction
Objectives
16.2 Time Series Graphs
Method of Drawing a Time Series Graph
Types of Time Series Graph
16.3 Stem-and-Leaf Displays
Stem-and-Leaf Display for More than one Set of Data
Merits of Stem-and-Leaf Display
16.4 Box Plots
Method of Construction of the Box Plots
Components of a Box Plot
Box Plots with Outliers
Box Plots with + Signs
Box Plots with Whisker, + Sign and Outliers
16.5 Summary
16.6 Solutions/Answers
16.1 INTRODUCTION
In Unit 15 of this block, we have discussed some of the techniques of graphical
presentation of data. In that unit, we have restricted ourselves to the graphical
methods which are used for representing frequency distributions. The present
unit discusses the graphical methods for time series data. A time series graph is
frequently used for analysing and presenting the time series data. Range chart
is a type of time series graph which is used for showing the range of variation.
The Band chart is another type of time series graph which shows the total for
successive time periods broken up into subtotals for each of the component
parts of the total. In this unit, we shall also discuss as to how data are
represented by plotting stem-and-leaf displays and box plots. Stem-and-leaf
display is like histograms, but here the additional feature is that the given value
of each individual is also shown in these displays. Further, in this unit we shall
discuss box plots to represent the data through five-number summary.
Objectives
After studying this unit, you should be able to:
describe a time series graph;
describe the method of drawing a time series graph;
draw a range chart and band chart;
describe the method of drawing a stem-and-leaf display;
describe the box plot and the different parts of the box plot; and
draw the box plots.
Years 2001 2002 2003 2004 2005 2006 2007 2008 2009
Production 30 25 35 40 25 30 40 45 20
Production
50
40
30
20 Production
10
0
2001 2002 2003 2004 2005 2006 2007 2008 2009
62
16.2.2 Types of Time Series Graph Graphical Presentation of
Data - II
There are two types of time series graphs:
(i) Range Chart (ii) Band Chart
Let us discuss these one by one:
(i) Range Chart
A range chart is a very useful method of showing the series of range of
variation or fluctuation between the maximum and minimum values of a
variable at the same point of time. For example, if we are interested in showing
the minimum and maximum prices of a commodity for different periods of
time or the minimum and maximum marks obtained by the students in different
years, etc. the range chart would be the appropriate option.
For drawing a range chart, we take time variable along x-axis and the value of
other variable on the y-axis. Then we draw two line graphs together by plotting
the maximum and minimum observations in the given data. One curve
representing the highest values at different point of time of the variable and the
other one representing the lowest values at the same different point of time.
The gap between both the curves represents the range of variation. For
highlighting the difference between the lowest and highest values, the use of
some colour or shade should be made. Let us take an example of drawing a
range chart.
Example 1: Represent the following data by range chart.
Days
Max. Temp. in C
Min. Temp. in C
Monday 38 12
Tuesday 41 16
Wednesday 35 14
Thursday 42 15
Friday 44 18
Saturday 45 20
Sunday 46 21
Solution: Since there are two variables with same scales of measurement, both
the variables are shown on the same graph as in Fig. 16.2.
Fig. 16.2 Range Chart for the Maximum and Minimum Temperature in a Week
64
Now, you can try the following exercises. Graphical Presentation of
Data - II
E1) Draw a range chart for the following data:
Class: 1 2 3 4 5 6 7 8 9 10
Max: Marks: 58 65 74 61 87 65 78 92 67 84
Min. Marks: 15 21 25 32 26 16 19 22 24 17
E2) Draw a band graph for the following data of quarterly results for profit (in
lakhs of rupees):
Quarters: Plant-I Plant-II Plant-III
Quarter-I 34 43 46
Quarter-II 41 47 41
Quarter-III 38 39 44
Quarter-IV 51 57 53
65
Presentation of Data After arranging the leaves in ascending order of magnitude, we have
0 00288
1 14478
2 246889
3 35
4 2268
Here starting parts show the 'tens digits' and the leaves show the ‘ones digits’ in
the above stem-and-leaf display. At a glance, one can see that 4 students got
marks in the 40's in their test out of 50. Out of these four students two got 42
marks each, whereas the other two got 46 and 48 marks in the test. Fourth row
(i.e. fourth stem) indicates that two students got marks in 30’s in their test out
of 50. And actual marks of these two students are 33 and 35. Similarly, we can
get the information about the marks of the other students from successive rows
(stems). When you count the total numbers of leaves, you may know how
many students appeared in the test. The information is nicely organised when a
stem-and-leaf display is used. Stem-and-leaf display provides a tool for specific
information in large sets of data, otherwise one would have a long list of marks
to arrange and analyse.
16.3.1 Stem-and-Leaf Display for more than one Set of Data
Stem-and-leaf display is also used to compare two sets of data. That is known
as 'back to back' stem-and-leaf display. For example, if you want to compare
the batting scores of two cricket players, then stem-and-leaf display is right
way to represent the data.
Example 4: Draw a stem-and-leaf display for batting scores of two players
given below.
Player A 102, 61, 82, 88, 90, 63, 69, 85, 105, 93, 65, 94, 107, 97, 67
Player B 104, 62, 83, 95, 106, 95, 108, 63, 108, 82, 93, 109
Solution: The scores of two players can be compared with the help of back to
back stem-and-leaf display as follows:
Leaf (Player A) Starting part Leaf (Player B)
13579 6 23
258 8 23
0347 9 355
257 10 46889
Here column of starting parts is now in the middle and the leaves columns are
to the right (player B) and left (player A) of the column of starting parts. You
can see that the player B has more innings with a highest score than the player
A. The player B has only 2 innings with scores of 62 and 63, while the player
A has 5 innings with the scores of 61, 63, 65, 67 and 69. You can also see that
player B has the highest score of 109, compared to player A with highest score
of 107. Thus we see that presentation of the data by stem-and-leaf display
provides us lot of information in very quick time.
In above two examples stem width or category interval was 10. Now we take
an example in which stem width is 5 instead of 10.
Example 5: Arrange the numbers 47, 35, 37, 20, 43,15, 15, 26,46, 25, 29, 12,
39, 44, 21, 24, 16, 40, 19, 46, 30, 34, 17, 39, 16, 40, 31, 21, 14, 42,16, 43, 22,
11, 24, 25, 31, 27, 40, 33 in a stretched stem-and-leaf display that has single-
digit starting parts and leaves, but has stem width of 5.
66
Solution: A simple stem-and-leaf display has a unique starting part for each Graphical Presentation of
stem with stem width 10, while the stretched stem-and-leaf display shown Data - II
below has a stem width 5 which means we have stretched the stem (of stem
width 10) into two stems each of width 5, and the same starting part is used for
both stems (i.e. for stem 1 we used 1a and 1b, for stem 2 we used 2a and 2b,
etc., it is also explained below the display). The required stretched stem-and-
leaf display is given as follows:
1a 241
1b 5569766
2a 0 1 4 12 4
2b 65957
3a 04113
3b 5799
4a 3400230
4b 655
In this stem-and-leaf display ‘a’ stands for the interval 0-4 and ‘b’ stands for 5-
9. The values between10-19 of stem 1 are now represented into two stems 1a
and 1b which include values between 10-14 and 15-19 respectively. Similarly,
values between 20-29 of stem 2 are now represented by two stems 2a and 2b,
which include values between 20-24 and 25-29 respectively, and so on.
16.3.2 Merits of Stem-and-Leaf Display
Following are some merits of stem-and-leaf display:
(i) Stem-and-leaf display arranges the data in place values.
(ii) Total number of observations and mode can easily be obtained from stem-
and-leaf display (see Example 3).
(iii) Summarises the shape of a set of data (the distribution) and provides the
detail regarding individual values.
(iv) Stem-and-leaf display also enables you to find quantiles such as median,
quartiles (i.e. Q1 , Q2 , Q3 ), deciles (i.e. D1 , D 2 , D3 , ... , D9 ), percentiles
(i.e. P1 , P2 , P3 , ..., P99 ), etc. As discussed below.
Formula for Calculating Quantiles: First of all given observations are
arranged in ascending order of magnitude. Then j mths quantiles denoted
by Q j/ m (e.g. 7/10 of the data are below Q 7 /10 ) is given by
Q j/ m x i , where x i is that value of the variable below which j mths
observations lie and
j n 1
i ... (16.1) ,where n = total number of observations
m 2
For example, let us apply this formula for finding median for the data of
Example 3:
Median = Second quartile = Q 2 / 4 :
j n 1 2 22 1
i 11.5
m 2 4 2
median = x11.5 = 11th observation in the array
+ 0.5(12th observation 11th observation)
= 22 + 0.5(24 22) = 23
67
Presentation of Data Now, you can try the following exercises.
E3) Draw a stem-and-leaf display with the following marks obtained by 30
students.
77, 80, 82, 68, 65, 59, 61, 57, 50, 62, 61, 70, 69, 64, 67,
70, 62, 65, 65, 73, 76, 87, 80, 82, 83, 79, 79, 71, 80, 77
Also determine the median for the marks.
E4) Draw a stem-and-leaf display for the following data.
31, 42, 22, 27, 33, 57, 67, 58, 64, 44, 65, 59, 46, 61, 35, 26, 63
Also find seventh decile.
E5) Draw a stem-and-leaf display for the given data:
141, 137, 105, 139, 107, 144, 110, 135, 117, 125, 147,113, 109, 120,
132, 110, 130, 112
Also find sixty seventh percentile.
The method of construction of separate box plots for the data of boys and girls
is discussed below:
There are several ways of constructing a box plot. The first relies on the
quartiles, lowest and greatest values in the distribution of scores. Fig. 16.4
shows how these three statistics are used for the above example. We draw a
68
box plot for each gender extending from the 1st quartile to the 3rd quartile. The Graphical Presentation of
2 nd quartile is drawn inside the box. Therefore, Data - II
For Boys
The lowest or smallest observation = x s = 16.
th
16 1
First quartile Q1 item
4
= 4.25th item = 4th item + 0.25 (5th item – 4th item)
= 19 + 0.25 (19 19) = 19
th
16 1
Second quartile = Q 2 2 item
4
= 8.5th item = Mean of 8th and 9th items
22 23
= = 22.5
2
th
16 1
Third quartile Q 3 3 item
4
= 12.75th item = 12th item + 0.75 (13th item – 12 th item)
= 25 + 0.75 (25 – 25) = 25
The largest observation = x l = 31
For Girls
The lowest or smallest observation = x s = 14
th
21 1
First quartile Q1 item
4
17 17
= 5.5th item = mean of 5 th and 6th items = = 17
2
th
21 1
Second quartile Q 2 2 th
item = 11 item = 19
4
th
21 1
Third quartile Q 3 3 item
4
21 21
= 16.5th item = mean of 16th and 17 th item = = 21
2
69
Presentation of Data The largest observation = x l = 28
Box plots for boys and girls on the basis of the above findings are shown below
in Fig. 16.4.
Fig. 16.4: The Box Plots with the Whiskers for Boys and Girls
Fig. 16. 5 Box Plot for Boys with Inner and Outer Fences
Box plot with the components discussed above for data of girls is shown on
next page in Fig 16.6.
71
Presentation of Data
Fig. 16.6 Box Plot for Girls with Inner and Outer Fences
Fig. 16.6: The Box Blot for Girls with the Outlier.
Fig. 16.7: The Box Plot with Whisker and + Signs for Boys and Girls Data.
Fig.
16.7 provides a revealing summary of the data. Since half of the scores are
between the hinges (recall that the hinges are the first and third quartiles), we
see that half of the girl’s times are between 17 and 21, whereas half of the boy's
times are between 19 and 25.
16.4.5 Box Plots with Whisker, + Sign and Outliers
On the basis of data of the example discussed in sub-section 16.4.1 we see that
girls generally dropped the balls from one box to another faster than boys. We
also see that one boy was slower than almost all of the women (except 3). Fig.
16.8 shows the box plot for the girl’s data with whisker, + sign and outliers.
Fig. 16.8: A Box-Plot for the Girl’s Data with Whisker, + Sign and Outliers
Note 2: If some learner is interested to know more about the topics discussed
in Secs. 16.4 and 16.5 he/she may refer chapters 6 and 7 of the book written at
serial number 5 in the reference books listed below the introduction of MST-
001 on page number 4 of block 1.
Now, you can try the following exercises.
73
Presentation of Data E6) Draw a box plot for the given data:
17, 15, 17, 20, 13, 15, 15, 16, 16, 15, 19, 12, 19, 14, 11, 14, 16, 10, 19, 18,
20, 14, 17, 19, 16, 22, 21, 23, 14, 12, 18, 13, 12, 25, 14, 15, 31, 17, 10, 21
E7) Draw a box plot for the given data:
31, 42, 22, 27, 33, 27, 37, 28, 34, 44, 25, 39, 26, 31, 26, 33, 46, 48, 50
16.7 SUMMARY
In this unit, we have discussed:
4) The box plots and the different components of the box plot.
74
E3) Stem-and-leaf display of the given data of marks obtained by 30 students Graphical Presentation of
is given below. Data - II
5 970
6 85121947255
7 700369917
8 0270230
After arranging the leaves in ascending order of magnitude, we have
5 079
6 11224555789
7 001367799
8 0002237
Median = second quartile = Q 2 / 4 :
j n 1 2 30 1
i 15.5
m 2 4 2
median = x 15.5 = 15th value in the array + 0.5(16th value 15th value)
= 70 + 0.5(70 70) = 70
E4) Stem-and-leaf display of the given data is given below.
2 276
3 135
4 246
5 789
6 74513
After arranging the leaves in ascending order of magnitude, we have
2 267
3 135
4 246
5 789
6 13457
D 7 = seventh decile = Q 7 / 10 :
j n 1 7 17 1
i 12.4
m 2 10 2
D 7 = x 12.4 = 12th value in the array + 0.4(13th value 12th value)
= 59 + 0.4(61 59) = 59.8
E5) Stem-and-leaf display of the given data is given below.
10 579
11 07302
12 50
13 79520
14 147
After arranging the leaves in ascending order of magnitude, we have
10 579
11 00237
12 05
13 02579
14 147
75
Presentation of Data P67 = sixty seventh percentile = Q 67 / 100 :
j n 1 67 18 1
i 12.56
m 2 100 2
P67 = x 12.56 = 12th value in the array + 0.56(13th value 12th value)
= 132 + 0.56(135 132) = 133.68
E6) After arranging the given data in ascending order of magnitude, we have
10, 10, 11, 11, 12, 12, 12, 13, 13, 14,14,14,14, 15, 15, 15, 15, 15, 16, 16,
16, 16, 17, 17, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 21, 21, 22, 23, 25,
25
The lowest or smallest observation = x s = 10
th
40 1
First quartile Q1 item
4
= 10.25th item = 10th item + 0.25 (11th item –10th item)
= 14 + 0.25 (14 – 14) = 14
th
40 1
Second quartile Q 2 2 item
4
16 16
= = 16
2
th
16 1
Third quartile Q 3 3 item
4
76
E7) After arranging the given data in ascending order of magnitude, we have Graphical Presentation of
Data - II
22, 25, 26, 26, 27, 27, 28, 31, 31, 33, 33, 34, 37, 39, 42, 44, 46, 48, 50
77