Stat I Chapter 1 and 2

UNIT ONE
Introduction to Statistics
Introduction
Statistics is the science that deals with the method of collection, organization, analysis of data and
interpretation of the results. The term statistics can also be defined in its plural sense. In the plural
sense statistics are collections of numerical facts, values that are obtained from sample results are
called statistics. The science of statistics is very essential for research and decision processes in all
aspects of human life.
Statistical analysis begins with data collection and the analysis of the data is then undertaken for
one of the following purpose:
 To summarize the finding of some inquiry.

 To obtain a better understanding of the phenomenon under study, primarily as an aid in
generalization or theory validation.
 To make a forecast of some variables, for example, rate of price movements in the coming
ten years in a given area.
 To evaluate the performance of some program.
 To help in selecting a course of action among a number of alternatives.
Statistics is primarily concerned with how to summarize and interpret variables. A variable is any
characteristic of an object that can be represented as a number. The values that the variable takes
will vary when measurements are made on different objects or at different times.
Each time that we record information about an object we observe a case. We might include several
different variables in the same case. For example, we might measure the height, weight, and hair
color of a group of people in an experiment. We would have one case for each person, and that case
would contain that person's height, weight, and hair color values. All of our cases put together are
called our data set. To some people, statistics means summarized data, such as unemployment
figures or the number of runs, hits, and errors in a baseball game. To others, it means a course of
study. Neither description is adequate. In presenting the statistics as a method of getting
information from data to help managers make decisions, we will see that statistics comprises
various techniques with a wide range of applications to practical problems. In this section you will
be introduced with the definition of statistics, classification and applications of statistical methods.
1
The definitions of statistics are very dynamic, changed from time to time. Some of the definitions of
statistics are given below:
Classification of Statistics
Statistics means different things to different people. And we can say there are as many definitions as the
number of people who have tried to define the term Statistics. Some of these definitions are:
 Statistics is a branch of mathematics that consists of a set of analytical techniques that can be
applied to data to help in making judgments and decisions in problems involving uncertainty.
 Statistics is a scientific discipline consisting of procedures for collecting, describing, analyzing and
interpreting numerical data.
 Statistics is a body of principles and methods concerned with extracting useful information from a
set of numerical data.
 Statistics is a body of methods dealing with collection, description, analysis, and interpretation of
information that can be given in a numerical form.
Classification of Statistics
Generally statistics can be classified as descriptive and inferential statistics based on their scope of
coverage.
Descriptive Statistics deals with methods of organizing, summarizing, and presenting numerical
data in a convenient form through graphs, charts, tables, etc. It deals with description of the
characteristics of large masses of data. E.g. the computation of average weekly sales for a
business, average number of students in a class, the average mark for a section for Introductory
statistics course, etc. Most of the statistical information in newspapers, magazines, company
reports, and other publications consists of data that are summarized and presented in a form that
is easy for the reader to understand. Such summaries of data, which may be tabular, graphical, or
numerical, are referred to as descriptive statistics.
Inferential statistics consists of a set of procedures that helps in making inferences and
predictions about a whole population (a collection of persons, objects, or items of interest)
based on information from a sample (a portion of the whole and, if properly taken, is
representative of the whole) of the population. It is a body of methods for drawing conclusions
2
(that is, making inferences) about characteristics of a population, based on information available
in a sample taken from the population. E.g. Using the average mark of a section for estimating
the average mark for ten sections for a given course. Many situations require information about a
large group of elements (individuals, companies, voters, households, products, customers, and so
on). But, because of time, cost, and other considerations, data can be collected from only a small
portion of the group. The larger group of elements in a particular study is called the population,
and the smaller group is called the sample.
NB. While descriptive statistics describes the characteristics of the observed data and helps to
reach conclusions about that same group only, inferential statistics provides methods for making
generalizations about the whole population based on the sample of observed data.
Important Terms in Statistics
Population is the totality of items under observation. It consists of all those items falling in to a
defined category or it is a set or collection of all possible observations of some specific
characteristic (usually people, objects, transactions or events). It is frequently large and may
sometimes be indefinitely large. E.g. All students in Ethiopia, all students in Addis Ababa, all
students in Faculty of Business, all students of a given department.
Census is the gathering of information from all elements in a population. It is the study of each
and every element in the population. It is sometimes called complete enumeration.
Parameter is the descriptive measure of the population under consideration. It is a measured

value of a population. Parameters are usually denoted by Greek or capital letters. Examples of
3
parameters are population mean (µ), population variance (σ2), and population standard deviation
(σ).
Sample is a finite part of a population. It is a representative subset of the population under

consideration. While taking a sample it is hoped that the characteristics of the population is
reflected in the sample so that we can generalize or infer from the latter to the former.
Sampling is a process of selecting out the representatives of a population from the population. It
is gathering information from the part of a population.
Statistic is a descriptive measure of a sample. It is a measured value of a sample. Statistics are
usually denoted by lower case Roman letters. Examples of statistics are sample mean ( x ),
sample variance (s2), and sample standard deviation (s).
Variable
It is property of an object or event that can take on different values. For example, college major
is a variable that takes on values like mathematics, computer science, English, psychology, etc.
 Discrete Variable - a variable with a limited number of values (e.g., gender

(male/female), college class (freshman/sophomore/junior/senior).
 Continuous Variable - a variable that can take on many different values, in theory, any
value between the lowest and highest points on the measurement scale.
Variables can also be broken down into two types:
 Quantitative variables: are those for which the value has numerical meaning. The value
refers to a specific amount of some quantity. You can do mathematical operations on the
values of quantitative variables (like taking an average). A good example would be a
person's height.
 Qualitative variables: are those for which the value indicates deferent groupings. Objects
that have the same value on the variable are the same with regard to some characteristic,
but you can't say that one group has \more" or \less" of some feature. It doesn't really
make sense to do math on categorical variables. A good example would be a person's
gender
4
Applications of Statistics
Statistics can be used in the following major business areas
1. Marketing: Statistical analysis are frequently used in providing information for making decision
in the field of marketing it is necessary first to find out what can be sold and the to evolve
suitable strategy, so that the goods which to the ultimate consumer. A skill full analysis of data
on production purchasing power, man power, habits of compotators, habits of consumer,
transportation cost should be consider to take any attempt to establish a new market.
2. Production: In the field of production statistical data and method play a very important role. The
decision about what to produce? How to produce? When to produce? For whom to produce is
based largely on statistical analysis.
3. Finance: The financial organization discharging their finance function effectively depend very
heavily on statistical analysis of peat and tigers.
4. Banking: Banking institute have found if increasingly to establish research department within
their organization for the purpose of gathering and analysis information, not only regarding
their own business but also regarding general economic situation and every segment of
business in which they may have interest.
5. Investment: Statistics greatly assists investors in making clear and valued judgment in his
investment decision in selecting securities which are safe and have the best prospects of
yielding a good income.
6. Purchase: the purchase department in discharging their function makes use of statistical data to
frame suitable purchase policies such as what to buy? What quantity to buy? What time to buy?
Where to buy? Whom to buy?
7. Accounting: statistical data are also employer in accounting particularly in auditing function, the
technique of sampling and destination is frequently used.
8. Control: the management control process combines statistical and accounting method in
making the overall budget for the coming year including sales, materials, labor and other costs
and net profits and capital requirement.
5
CHAPTER TWO
DATA COLLECTION AND PRESENTATION
2.1 Data Collection
Statistical Data
Data are set of values collected for some purpose. They are raw facts about a phenomenon
which do not give any message. Data are records of the actual state of some measurable
aspects of the universe at a particular point in a given time. They are not abstract but are
concrete, tangible and countable features of a particular aspect.
2.1.1 Classification of Data

A. Based on their sources
There are two types of data based on their source. These are primary and secondary data.
Primary data – These are data which are the measurements and records of original study.
These are data which are collected as a fresh and for the first time and thus happens to be
original in character. These are data which are directly measured and recorded from the
source. These are data which are not collected by someone else before.
Secondary Data – In some situations there are cases which are not conducive for the
principal investigator to start his study from the very beginning. In such a situation he may
use and take in to consideration what have already been collected by others.
Secondary data are those which have already been collected by someone else and which
have already been passed through some statistical process. When an investigator uses the
data which have already been collected by others, such data are called secondary data.
Secondary data can be taken from journals, reports, periodicals, publications, etc.
Secondary data should be used with greater care. The investigator, before using these data,
must observe that they possess the following characteristics.
6
1. Reliability of Data: The data collected from other source should be reliable enough
to be used by the investigator. Determining and testing the reliability of secondary
data is the most important as well as difficult task. Reliability can be tested by
answering questions like:
 Who collected them?
 What were the sources of data?
 What methods were used to collect them?
 At what time were they collected?
2. Suitability of Data:
Before using the secondary data, they must be evaluated whether they could serve for
another purpose other than the one for which they were collected. The suitability of
data can be evaluated from the point of the nature and scope of investigation view.
3. Adequacy of Data: Reliability and suitability of secondary data may not be sufficient
for the investigator to use these data for analysis. Besides these, they should be tested
for adequacy. Adequacy can be tested by evaluating the data in terms of area
coverage, level of accuracy; number of respondents participated and so on.
B. Based on their nature

Data are facts, observations, and information that come from investigations.
The field of statistics deals with measurements—some quantitative and others qualitative. The
measurements are the actual numerical values of a variable. (Qualitative variables could be
described by numbers, although such a description might be arbitrary; for example, N = 1, E = 2,
S =3, W = 4, Y = 1, N = 0.)
Scales of Measurement
The four generally used scales of measurement are listed here from weakest to strongest.
A. Nominal Scale. In the nominal scale of measurement, numbers are used simply as labels
for groups or classes. If our data set consists of blue, green, and red items, we may
designate blue as 1, green as 2, and red as 3. In this case, the numbers 1, 2, and 3 stand
7
only for the category to which a data point belongs. ―Nominal‖ stands for ―name‖ of
category. The nominal scale of measurement is used for qualitative rather than
quantitative data: blue, green, red; male, female; professional classification; geographic
classification; and so on.
B. Ordinal Scale. In the ordinal scale of measurement, data elements may be ordered
according to their relative size or quality. Four products ranked by a consumer may be
ranked as 1, 2, 3, and 4, where 4 is the best and 1 is the worst. In this scale of
measurement we do not know how much better one product is than others, only that it is
better.
C. Interval Scale. In the interval scale of measurement the value of zero is assigned
arbitrarily and therefore we cannot take ratios of two measurements. But we can take
ratios of intervals. A good example is how we measure time of day, which is in an
interval scale. We cannot say 10:00 A.M. is twice as long as 5:00 A.M. But we can say
that the interval between 0:00 A.M. (midnight) and 10:00 A.M., which is duration of 10
hours, is twice as long as the interval between 0:00 A.M. and 5:00 A.M., which is
duration of 5 hours. This is because 0:00 A.M. does not mean absence of any time.
Another example is temperature. When we say 0°F, we do not mean zero heat. A
temperature of 100°F is not twice as hot as 50°F.
D. Ratio Scale. If two measurements are in ratio scale, then we can take ratios of those
measurements. The zero in this scale is an absolute zero. Money, for example, is
measured in a ratio scale. A sum of $100 is twice as large as $50. A sum of $0 means
absence of any money and is thus an absolute zero. We have already seen that
measurement of duration (but not time of day) is in a ratio scale. In general, the interval
between two interval scale measurements will be in ratio scale. Other examples of the
ratio scale are measurements of weight, volume, area, or length.
Data can also be classified as either qualitative or quantitative. Qualitative data include labels
or names used to identify an attribute of each element. Qualitative data use either the nominal or
ordinal scale of measurement and may be nonnumeric or numeric. Quantitative data require
numeric values that indicate how much or how many. Quantitative data are obtained using either
the interval or ratio scale of measurement.
8
Methods of Data Collection
Data are records of the actual state of some measurable aspect of the universe at a particular
point in time. Data are not abstract; they are concrete, they are measurements or the tangible and
countable features of the world. In general, data could be quantitative (expressed in numerical
form) or qualitative (expressed in the form of verbal descriptions rather than numbers).
Methods of Primary Data Collection
Primary data are those which are collected afresh and for the first time, and thus happen to be
original in character. Its advantage is its relevance to the user, but it is also likely to be expensive
in time and money terms to collect. The primary data can be collected using the following
methods
A. OBSERVATION
Observation is the most commonly used method of data collection especially, in behavioral
studies. This method could be used both for cross checking information obtained using other
methods and for understanding processes which are difficult to grasp in an interview context.
This method is useful when studying subjects who are not capable of giving verbal reports of
their feelings for one reason or another.
Advantages of observation method:
1. subjective bias is eliminated, if observation is done accurately
2. the information obtained relates to what is currently happening; it is not complicated by

either the past behavior or future intentions or attitudes
3. it is independent of respondents’ willingness to respond and as such is relatively less

demanding of active cooperation on the part of respondents as happens to be the case in
the interview or the questionnaire method.
Limitations:
9
1. expensive;
2. the information obtained is limited ;

3. Sometimes unforeseen factors may interfere with the observational task.
B. Interview
The interview method of collecting data involves presentation of oral-verbal stimuli and reply in
terms of oral-verbal responses. This method can be used through personal interviews and, if
possible, through telephone interviews.
Personal interviews: This method requires a person (interviewer) asking questions in a face-to-
face contact to the interviewee.
If the interview is carried out in a structured way, it is called structured interview. This involves
the use of a set of predetermined questions and highly standardized techniques of recording. The
interviewer in a structured interview follows a rigid procedure laid down, asking questions in a
form and order prescribed. As against it, the unstructured interviews are characterized by a
flexibility of approach to questioning. In unstructured interview, the interviewer is allowed much
greater freedom to ask, in case of need, supplementary questions or at times he may omit certain
questions if the situation so requires. He may even change the sequence of questions. But this
sort of flexibility results in lack of comparability of one interview with another and the analysis
of unstructured responses becomes much more difficult and time consuming than that of the
structured responses obtained in case of structured interviews.
Advantages of personal interviews:
1. More information and in greater depth can be obtained

2. The interviewer by his own skill can overcome the resistance, if any, of the respondents
3. There is greater flexibility especially in case of unstructured interviews
4. personal information can be obtained easily
5. samples can be controlled effectively as there arises no difficulty of missing returns; non-
response generally remains very low
6. the language of the interview can be adopted to the ability or educational level of the person
interviewed
10
Some of the weaknesses of the personal interview method:
1. It is very expensive, especially when large and widely spread geographical sample is taken
2. The possibility of the bias of interviewer as well as that of the respondent
3. Certain types of respondents may not be easily approachable (eg. Important officials or
executives, people in high income groups)
4. It is relatively more time consuming
5.
Telephone interviews: This method of collecting information consists in contacting respondents
on telephone itself. It is not a very widely used method, but plays important part in industrial
surveys, particularly in developed countries.
Some of the chief merits of telephone interview are:
1. It is faster than other methods

2. It is cheaper than personal interview method; the cost per response is relatively low
3. Recall is easy; callbacks are easy and economical
4. Replies can be recorded without causing embarrassment to respondents
5. No field staff is required
Some of the demerits of telephone interview are:
1. Little time is given to respondents for considered answers

2. Surveys are restricted to respondents who have telephone facilities
3. It is not suitable for intensive surveys where comprehensive answers are required
4. Questions have to be short and to the point; probes are difficult to handle
C. Questionnaire
This method is quite popular, particularly in case of big inquiries. Service evaluations of hotels,
restaurants, transportation providers, and other service providers are good examples of self-
administered questionnaire. Often a short questionnaire is left to be completed by the respondent
in a convenient location. In a mail survey, a questionnaire can also be sent (usually by post) to
the persons concerned with a request to answer the questions and return the questionnaire.
11
A questionnaire consists of a number of questions printed or typed in a definite order on a form
or set of forms. The questionnaire is mailed to respondents who are expected to read and
understand the questions and write down the reply in the space meant for the purpose in the
questionnaire itself.
The merits of this method are:
1. it is free from the bias of the interviewer; answers are in respondents’ own words
2. respondents have adequate time to give well thought out answers
3. respondents who are not easily approachable can also be reached conveniently
The main demerits of this system can be:
1. it can be used only when respondents are educated and cooperating

2. the control over questionnaire may be lost once it is sent
3. there is inbuilt inflexibility because of the difficulty of amending the approach once
questionnaires have been dispatched
4. There is also possibility of ambiguous replies or omission of replies altogether to certain
questions
Methods of Secondary Data Collection
The use of existing data (secondary data) in a research activity is termed as desk research simply
because the person carrying it out can usually gather such data with out leaving his/her desk. In
any type of study, it is advisable to assess the availability of secondary data before embarking
upon a primary data collection exercise, since the latter is expensive in terms of time, money and
manpower.
The following list includes Sources of Secondary data:
 Historical documents, archives, maps, photographs, letters, biographies, autobiographies,

diaries, textbooks, periodicals
 On-line and Electronic Data Bases;
 Different Central Statistical Authority Publications;
 Different Publications by Regional Governments;
 Various publications by the different Ministries;
12
Data Presentation
After data have been collected, the next step is to present it in some convenient way. The
logic behind data presentation is that statistical data in their raw form are difficult to
understand and summarize. When data are presented, the user can understand it in some
meaningful form with in short period of time. Therefore, Data presentation is the process of
re-organization, classification, compilation and summarization of data to present it in a
meaningful form.
2.1.2 Tabular presentation or Frequency Distributions (Absolute, Relative and

Cumulative Distributions)
Frequency distribution: A grouping of data into categories showing the number of

observations in each mutually exclusive category
It is the process of organization of raw data in a table form using classes and frequencies.
There are two types of frequency distributions; these are categorical and grouped
frequency distribution.
A. Categorical Frequency Distribution
The categorical frequency distribution is used for data which are qualitatively described.
The important thing here is that it can be able to classify the data in to complete and non-
overlapping categories.
Example: The following are data of employees of organization X by level of education (LOE)
No Name LOE
1 Abebe Diploma
2 Hordofa B.Sc
3 Toga M.Sc
4 Kahsay PhD
13
5 Ahmed Diploma
6 Hirut B.Sc
“ ” “
“ “ “
“ “ “
50 Kassech Ph.D
There are 15 workers having diploma, 20 workers having B.Sc, 10 works having M.Sc and 5
workers having Ph.D.
Required: Present the data using appropriate method of presentation?
Since level of education is a qualitative variable the appropriate method of presentation is

using categorical distribution. The following is the categorical distribution of employees by
level of education.
Employees of Organization X by LOE
LOE NO Percentage
Diploma 15 30%
Bachelor 20 40%
Master 10 20%
Ph.D 5 10%
Total 50 100%
14
B. Grouped Frequency Distribution
This is a method of presenting data which is quantitatively measured and when a variable
contains a large volume of raw data. It contains several important concepts such as class
limits, class width, class interval and frequencies. Class limits are classified as lower class
limit and upper class limit. .
In constructing a frequency distribution, we have to first choose the number of classes.
The steps necessary to define the classes for a frequency distribution with quantitative data are:
1. Determine the range
2. Determine the number of non-overlapping classes.
3. Determine the width of each class.
4. Determine the class limits.
Let us demonstrate steps by developing a frequency distribution for the audit time data
A. Number of Classes: Classes are formed by specifying ranges that will be used to group
the data. As a general guideline, we recommend using between 5 and 20 classes. For a
small number of data items, as few as five or six classes may be used to summarize the
data. For a larger number of data items, a larger number of classes is usually required.
The goal is to use enough classes to show the variation in the data, but not so many
classes that some contain only a few data items.
There is also a formula which helps to determine K.
K = 1  3.322 log n . This formula will not give us a whole number.
B. Width of the Classes the second step in constructing a frequency distribution for
quantitative data is to choose a width for the classes. As a general guideline, we
recommend that the width be the same for each class. Thus the choices of the number of
classes and the width of classes are not independent decisions. A larger number of classes
means a smaller class width, and vice versa. To determine an approximate class width,
we begin by identifying the largest and smallest data values. Then, with the desired
15
number of classes specified, we can use the following expression to determine the
approximate class width.
Approximate class width = Largest data value _ Smallest data value
Number of classes
The approximate class width given by equation can be rounded to a more convenient
value based on the preference of the person developing the frequency distribution. For
example, an approximate class width of 9.28 might be rounded to 10 simply because 10
is a more convenient class width to use in presenting a frequency distribution.
In practice, the number of classes and the appropriate class width are determined by trial
and error. Once a possible number of classes is chosen, the equation is used to find the
approximate class width. The process can be repeated for a different number of classes.
Ultimately, the analyst uses judgment to determine the combination of the number of
classes and class width that provides the best frequency distribution for summarizing the
data. For the audit time data in Table after deciding to use five classes, each with a
width of five days, the next task is to specify the class limits for each of the classes.
C. Class Limits Class limits must be chosen so that each data item belongs to one and only
one class. The lower class limit identifies the smallest possible data value assigned to the
class. The upper class limit identifies the largest possible data value assigned to the class.
In developing frequency distributions for qualitative data, we did not need to specify
class limits because each data item naturally fell into a separate class. But with
quantitative data, such as the audit times in the table class limits are necessary to
determine where each data value belongs. Using the audit time data in the table, we
selected 10 days as the lower class limit and 14 days as the upper class limit for the first
class.
16
The smallest data value, 12, is included in the 10 –14 class. We then selected 15 days as the
lower class limit and 19 days as the upper class limit of the next class. We continued defining the
lower and upper class limits to obtain a total of five classes: 10–14, 15–19, 20–24, 25–29, and
30–34. The largest data value, 33, is included in the 30 –34 class. The difference between the
lower class limits of adjacent classes is the class width. Using the first two lower class limits of
10 and 15, we see that the class width is 15 - 10 = 5. With the number of classes, class width, and
class limits determined, a frequency distribution can be obtained by counting the number of data
values belonging to each class.
The most frequently occurring audit times are in the class of 15–19 days. Eight of the 20 audit
times belong to this class. Only one audit required 30 or more days. Other conclusions are
possible, depending on the interests of the person viewing the frequency distribution. The value
of a frequency distribution is that it provides insights about the data that are not easily obtained
by viewing the data in their original unorganized form.
D. Class Midpoint In some applications, we want to know the midpoints of the classes in a
frequency distribution for quantitative data. The class midpoint is the value halfway
between the lower and upper class limits. For the audit time data, the five class midpoints
are 12, 17, 22, 27, and 32.
Exercises
1. The dean of the college of Business and Economics wishes to determine the amount
of studying business and economics students do. He selects a random sample of 30
students and determines the number of hours each student studies per week: 15.0,
23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4,
17
18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.Organize
the data into a frequency distribution.
2. Consider the following class
Class Intervals Frequency
10-15 10
16 - 21 20
22 – 27 30
28-33 25
Total 85
Required:
i. Determine LCL, LCB, UCL, and UCB for each class.
ii. Develop additional columns for class marks, relative frequencies, less than
cumulative frequencies, and more than cumulative frequencies.
Cumulative Frequency
There are two types of cumulative frequency
 Less than type cumulative frequency

 Greater than type cumulative frequency
Less than type cumulative frequency

The total frequencies of a particular class and all classes prior to that particular class is called the
Less than type cumulative frequency of that class or simply the Cumulative frequency of that
class.
Example
The marks of 30 students of a class, obtained in a test out of 75, are given below:
42, 21, 50, 37, 42, 37, 38, 42, 49, 52, 38, 53, 57, 47, 29, 59, 61, 33, 17, 17, 39, 44, 42, 39, 14, 7,
27, 19, 54, 51.
18
A frequency table and a cumulative frequency table with equal class interval is formed.
The Greater than Type cumulative frequency: The cumulative frequency of a particular class
and all the classes after that class is called the "greater than" type cumulative frequency.
The greater than cumulative frequencies are related to lower class limit and form a
decreasing sequence. The marks of 30 students of a class, obtained in a test out of 75, are given
below:
42, 21, 50, 37, 38, 42, 49, 52, 38, 53, 57, 47, 29, 59, 61, 33, 17,17, 39, 44, 42, 39, 14, 7, 27, 19,
54, 51.
On the basis of the given data a frequency table and a cumulative frequency "greater than" type
of table with equal class interval would look like this.
19
2.1.3 Diagrammatical presentation of data
Even though tabular method of presentation yields good information for those who
can understand them, they may not generate understandable information for
common people. Because of this reason we are introducing other means of data
presentation which will have more importance. Diagrammatic presentation of data
has the following advantages:
 They help in drawing the required information with short period of time
without any complexity.
 They have greater attraction than figures.
 They facilitate comparison
Diagrammatic presentations have greater importance in the presentation of
categorical data. There are different types of diagrammatic presentation that are in
use these days. Some of these are discussed next.
20
A. Bar charts
Bar charts are one dimensional rectangular diagrams used to display usually
qualitative distributions. Bar charts have the following common characteristics:
a. The length or height of the bar associated with a category of a class interval
represents the corresponding frequency.
b. The bars are equally spaced. Equal space should be left between consecutive
bars.
c. Each bar has equal width
There are different types of bar charts used for data presentation: Vertical bar
graph, horizontal bar graph, grouped bar graph, Stocked bar graph, etc.
Example: Consider the preceding illustration about the level of Education of employees of
certain organization.
LOE NO
Diploma 15
Bachelor 20
Master 10
Ph.D 5
Total 50
21
60
Total
50
40 Ph.D
30 Master
20
Bachelor
10
Diploma
0
Diploma Bachelor Master Ph.D Total 0 20 40 60
Vertical bar graph Horizontal bar graph
Example: The following describes types of clothes and area of sales of these clothes
for the year 2005 by a textile factory. Present it using a bar chart.
Types of cloth Area of Sales Total
Local Export
Men’s 150 100 250
Women’s 125 225 350
Children’s 70 110 180
Total 345 435 780
22
Total sales for 2005 by the type of clothes manufactured and areas of sales
1000
800
600 Local
400 Export
Total
200
0
Men’s Women’s Children’s Total
2000
1500
Total
1000
Export
500 Local
0
Men’s Women’s Children’s Total
Total
Children’s
Total
Export
Women’s
Local
Men’s
0 200 400 600 800 1000
23
B. Pie- Chart
Pie-Chart is a circle divided in to component sectors according to the proportion of

components from the total. It is constructed by dividing 3600 of a circle in to angles
each of which is proportional to the size of the respective component.
Example: Take the data of employees of organization X. Present it using pie-chart.
LOE fi fi Ai
n
Diploma 15 15/50 = 0.3 0.3 x 360 = 108
Bachelor 20 20/50 = 0.4 0.4 x 360 = 144
Masters 10 10/50 = 0.2 0.2 x 360 = 72
Ph.D 5 5/50 = 0.1 0.1 x 360 = 36
Total 50 1 360
In pie-chart it is better to shade different colors for each component. It is also being
better that percentages are associated with each component for easy comparison.
Diploma
Bachelor
Masters
Ph.D
24
Graphical presentation of data
Histogram
It is a type of diagrammatic presentation which is commonly used for frequency

distribution with continuous classes. The following are important features of
histogram.
 It can’t be used with frequency distribution having open ended class.

 No space is left between bars
What is a Histogram?
A histogram is used to summarize discrete or continuous data. In other words, it

provides a visual interpretation of numerical data by showing the number of data
points that fall within a specified range of values (called “bins”). It is similar to a
vertical bar graph. However, a histogram, unlike a vertical bar graph, shows no gaps
between the bars.
Parts of a Histogram
1. The title: The title describes the information included in the histogram.
2. X-axis: refers intervals that show the scale of values which the measurements
fall under.
3. Y-axis: The Y-axis shows the number of times that the values occurred within
the intervals set by the X-axis.
4. The bars: The height of the bar shows the number of times that the values
occurred within the interval, while the width of the bar shows the interval
that is covered. For a histogram with equal bins, the width should be the same
across all bars.
25
Frequency Polygon
A frequency polygon is a graphical form of representation of data. It is used to depict

the shape of the data and to depict trends. It is usually drawn with the help of a
histogram but can be drawn without it as well. A histogram is a series of rectangular
bars with no space between them and is used to represent frequency distributions.
Steps to Draw a Frequency Polygon
 Mark the class intervals for each class on the horizontal axis. We will plot the
frequency on the vertical axis.
 Calculate the classmark for each class interval. The formula for class mark is:
Classmark = (Upper limit + Lower limit) / 2
 Mark all the class marks on the horizontal axis. It is also known as the mid-
value of every class.
 Corresponding to each class mark, plot the frequency as given to you. The
height always depicts the frequency. Make sure that the frequency is plotted
against the class mark and not the upper or lower limit of any class.
26
 Join all the plotted points using a line segment. The curve obtained will be
kinked.
 This resulting curve is called the frequency polygon.
Note that the above method is used to draw a frequency polygon without drawing a
histogram. You can also draw a histogram first by drawing rectangular bars against
the given class intervals. After this, you must join the midpoints of the bars to obtain
the frequency polygon. Remember that the bars will have no spaces between them
in a histogram.
We now start by plotting the class marks such as 54.5, 64.5, 74.5 and so on till 94.5.
Note that we will also plot the previous and next class marks to start and end the
polygon, i.e. we plot 44.5 and 104.5 as well.
Then, the frequencies corresponding to the class marks are plotted against each
class mark. Like you can see below, this makes sense as the frequency for class
marks 44.5 and 104.5 are zero and touching the x-axis. These plot points are used
only to give a closed shape to the polygon. The polygon looks like this:
Frequency Polygon
27
Cumulative frequency curve or O-give
The word Ogive is a term used in architecture to describe curves or curved

shapes. Ogives are graphs that are used to estimate how many numbers lie below
or above a particular variable or value in data. To construct an Ogive, firstly, the
cumulative frequency of the variables is calculated using a frequency table. It is
done by adding the frequencies of all the previous variables in the given data set.
The result or the last number in the cumulative frequency table is always equal
to the total frequencies of the variables. Let us discuss one of the graphs called
“Ogive” in detail. Here, we are going to have a look at what is an Ogive, graph,
chart and an example in detail.
The graphs of the frequency distribution are frequency graphs that are used to
exhibit the characteristics of discrete and continuous data. Such figures are more
appealing to the eye than the tabulated data. It helps us to facilitate the
comparative study of two or more frequency distributions. We can relate the
shape and pattern of the two frequency distributions.
The two methods of Ogives are:
 Less than Ogive
 Greater than or more than Ogive
28
The graph given above represents less than and the greater than Ogive curve.
The rising curve (Brown Curve) represents the less than Ogive, and the falling
curve (Green Curve) represents the greater than Ogive.
Less than Ogive
The frequencies of all preceding classes are added to the frequency of a class.
This series is called the less than cumulative series. It is constructed by adding
the first-class frequency to the second-class frequency and then to the third class
frequency and so on. The downward accumulation results in the less than
cumulative series.
Greater than or More than Ogive
The frequencies of the succeeding classes are added to the frequency of a class.
This series is called the more than or greater than cumulative series. It is
constructed by subtracting the first class, second class frequency from the total,
third class frequency from that and so on. The upward accumulation result is
greater than or more than the cumulative series.
29

Stat I Chapter 1 and 2

Uploaded by

Copyright:

Available Formats

Stat I Chapter 1 and 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat I Chapter 1 and 2

Uploaded by

Copyright:

Available Formats

UNIT ONE

 To summarize the finding of some inquiry.

Important Terms in Statistics

Parameter is the descriptive measure of the population under consideration. It is a measured

Sample is a finite part of a population. It is a representative subset of the population under

Statistic is a descriptive measure of a sample. It is a measured value of a sample. Statistics are

 Discrete Variable - a variable with a limited number of values (e.g., gender

Variables can also be broken down into two types:

Statistics can be used in the following major business areas

DATA COLLECTION AND PRESENTATION

2.1 Data Collection

2.1.1 Classification of Data

B. Based on their nature

Methods of Primary Data Collection

Advantages of observation method:

1. subjective bias is eliminated, if observation is done accurately

2. the information obtained relates to what is currently happening; it is not complicated by

3. it is independent of respondents’ willingness to respond and as such is relatively less

2. the information obtained is limited ;

Advantages of personal interviews:

1. More information and in greater depth can be obtained

Some of the chief merits of telephone interview are:

1. It is faster than other methods

1. Little time is given to respondents for considered answers

The merits of this method are:

1. it can be used only when respondents are educated and cooperating

The following list includes Sources of Secondary data:

 Historical documents, archives, maps, photographs, letters, biographies, autobiographies,

2.1.2 Tabular presentation or Frequency Distributions (Absolute, Relative and

Frequency distribution: A grouping of data into categories showing the number of

A. Categorical Frequency Distribution

Required: Present the data using appropriate method of presentation?

Since level of education is a qualitative variable the appropriate method of presentation is

Employees of Organization X by LOE

In constructing a frequency distribution, we have to first choose the number of classes.

1. Determine the range

2. Determine the number of non-overlapping classes.

3. Determine the width of each class.

4. Determine the class limits.

Class Intervals Frequency

There are two types of cumulative frequency

 Less than type cumulative frequency

Less than type cumulative frequency

Vertical bar graph Horizontal bar graph

Types of cloth Area of Sales Total

Men’s 150 100 250

Women’s 125 225 350

Children’s 70 110 180

Total 345 435 780

0 200 400 600 800 1000

Pie-Chart is a circle divided in to component sectors according to the proportion of

Example: Take the data of employees of organization X. Present it using pie-chart.

Diploma 15 15/50 = 0.3 0.3 x 360 = 108

Bachelor 20 20/50 = 0.4 0.4 x 360 = 144

Masters 10 10/50 = 0.2 0.2 x 360 = 72

Ph.D 5 5/50 = 0.1 0.1 x 360 = 36

It is a type of diagrammatic presentation which is commonly used for frequency

 It can’t be used with frequency distribution having open ended class.

A histogram is used to summarize discrete or continuous data. In other words, it

A frequency polygon is a graphical form of representation of data. It is used to depict