Statistical Methods-BITS (1)
Statistical Methods-BITS (1)
October 2024
Overall Course Objectives
Upon completing this course, students will be able to:
distributions;
Singular form
systematic collection and interpretation of numerical data to make a decision
3
Classification of Statistics
Descriptive Statistics
Mainly concerned with the methods and techniques used in collection,
The sample shows 40% of year I students have positive attitude toward the
delivery of lectures.
Drawing graphs that show the difference in the ‘scores’ of fourth year
5
Classification of Statistics …
Inferential Statistics
Deals with the method of inferring or drawing conclusion about the
Utilizes sample data to make decision for entire data set based on sample
6
Definition of Some Basic Statistical Terms
Data
a collection of related facts and figures from which conclusions may be
drawn
Population/target population
a totality of things, objects, peoples, etc about which information is being
collected
7
Definition of Some Basic Statistical Terms
Sample
part of a population selected to draw conclusions about the population
Subset of a population
Population
Sample
Census
a complete enumeration of the population. But in most real problems it
cannot be realized, hence we take sample.
8
Definition of Some Basic Statistical Terms
Statistic
A value computed from the sample, used to describe the sample.
Parameter
A descriptive measure (value) computed from the population.
Variable
is a characteristic or attribute that can assume different values.
Sampling frame
A list of people, items or units from which the sample is taken.
9
Statistical Data Properties
10
Stages in Statistical Investigation
1. Data Collection
The processes of measuring, assembling and gathering data
11
Stages in Statistical Investigation …
2. Data Organization
It is a stage where we edit our data
The collected data involve irrelevant figures, incorrect facts, omission and
mistakes
3. Data Presentation
The organized data can now be presented in the form of tables, diagram and
graphs.
12
Stages in Statistical Investigation …
4. Data Analysis
Study the data to draw conclusions about the population parameter
5. Data Interpretation
Draw valid conclusions from the results obtained through data analysis
13
Uses and Limitations of Statistics
Uses of Statistics
Condenses and summarizes complex data
14
Uses and Limitations of Statistics …
Limitations of Statistics
Statistics doesn’t deal with single (individual) values rather it deals with
aggregate values
the subject
15
Scales of Measurment
A variable in statistics is any characteristic, which can take on different
values for different elements when data are collected
16
Scales of Measurement
Measurement “is assigning numbers to objects, events, or abstract
concepts according to a known set of rules”
Ordinal Scale
Interval Scale
18
Scales of Measurement …
Interval Scales of Measurement
A measure of order and quantity
Difference between values can be calculated.
Possible to add and subtract.
Multiplication and division are not possible
Example: Temperature (10oC (50oF) and 20oC (68OF) as between 25oc (77oF) and 35oc
(95oF))
19
1.2. Methods of Data Collection and Presentation
Sources of Data
Primary data
data measured or collect by the investigator or the user directly from the source
the data you collect is unique to you and your research and, until you publish, no one
The primary sources of data are objects or persons from which we collect the
Secondary data
second-hand information and data or information that was either gathered by
someone else
20
Sources of Data
21
Methods of Data Collection
Planning to data collection requires
Identify source and elements of the data
22
Methods of Data Collection
There are three major methods of data collection.
1) Observational or measurement.
2) Interview with questionnaires.
a. Face to face interview.
b. Telephone interview.
c. Self administered questionnaires returned by mail (mailed
questionnaire).
3) The use of documentary sources
Observational or measurement ( direct personal observation)
In this case data can be obtained through direct observation or
measurement. This requires training and monitoring of the measurer to
ensure the use of standard procedure.
Provides accurate information but it is expensive and inconvenient.
Example: laboratory tests, clinical measurements and physical
23
examination etc.
Interview with questionnaires: Hear one drafts a detailed
questionnaire. These questionnaires can either be mailed to
the respondent for filling and returning, or can put in charge
of the enumerators who go around and fill them after
obtaining the desired information.
Questionnaires: are written documents which instruct the
reader or listener to answer the questions written on it.
Respondents (Interviewees): are individuals those who are
answered the questions on the questionnaire.
Interviewers: are individuals those who are recorded the
responses given by the respondents.
24
a) Face to Face Interviews (questionnaires in charge of enumerators)
The interviewer knows exactly who is responding to the questionnaire.
Advantages
The interviewer can help the respondent if he/she has difficulty in
understanding the questions. The difficulty could be due to language,
concentration or limited intellectual capacity.
There is more flexibility in presenting the items; they can range from closed
to open.
There is the ability to use the method of skip patterns.
Skip patterns means skipping a questions or a group of questions which are
not applicable.
Disadvantages
It costs much in terms of time and money.
Attribute of the interviewer may affect the responses due to:
a) Bias of the interviewer and
b) his/her social or ethnic characteristics.
Untrained interviewer may distort the meaning of the questions.
b. Telephone Interviews
Advantages
• It is less expensive in time and money compared with face
to face interviews.
• The interviewer is able to help the respondent if he/she
doesn’t understand the question (as seen with face to face
interview)
• Broad representative samples can be obtained for those
who have telephone lines.
Disadvantage
Under representation of those groups which do not have
telephones.
Respondent may be substituted by another.
26
Problem with unlisted telephone number in the directory.
c. Self administered questionnaires returned by
mail (mailed questionnaire)
Here the questionnaire is mailed to the respondents to be filled.
Sometimes it is known as self enumeration.
Advantages
These are the cheapest.
There is no need for trained interviewer.
There is no interviewer bias.
Disadvantage
• Low response rate
• Uncompleted questionnaires due to omission or invalid
responses.
• No assurance that the questionnaire was answered by the right
person
• Needs intense follow up to get a high response rate. 27
3. The use of documentary sources
Extracting information from existing sources (e.g. Hospital
records) is much less expensive than the other two methods. It
can be an important source of data.
Advantage of secondary data
Secondary data may help to clarify or redefine the definition of the
problem as part of the exploratory research process.
Provides a larger database as compared to primary data
Time saving
Does not involve collection of data
Disadvantages of secondary data
It is difficult to get information needed, when records are
compiled in unstandardized manner.
Lack of availability Inaccurate data
Lack of relevance Insufficient data
Methods of Data Presentation
The major objectives of data presentation are
To presenting data in visual display and more understandable
Diagrams, and
Graphs
29
Methods of Data Presentation …
Tabular presentation of data
Tables are important to summarize large volume of data in more
understandable way.
Tables can be
Simple (one way table): table which present one characteristics for example age
distribution.
Two way table: it presents two characteristics in columns and rows for example
age versus sex.
A higher order table: table which presents two or more characteristics in one
table.
30
Methods of Data Presentation …
Frequency Distribution
It is the organization of raw data in table form, using classes and frequencies.
31
Methods of Data Presentation …
Categorical Frequency Distribution
The categorical frequency distribution is used for data which can be placed
A B C D
Class Tally Frequency Percent
32
Methods of Data Presentation …
Example: Data on smoking status by gender of a sample of 20 health workers in
Jimma Hospital 1986 E.C was given. Construct categorical frequency
distribution.
Observation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Gender M F M M F F F M M M F F F F M F M F M M
Smoking Y N N Y N N Y N N N N N N Y Y Y N N Y Y
status
Characteristics Tally Frequency
Gender
Male //// //// 10
Female //// //// 10
Smoking status
No //// //// // 12
Yes //// /// 8
33
Methods of Data Presentation …
Ungrouped Frequency Distribution
It is the distribution that use individual data values along with their
frequencies.
often constructed for small set of data on discrete variable (when data are
The major components of this type of frequency distributions are class, tally,
34
Methods of Data Presentation …
Example: Age in year of 20 women who attended health education at Jimma
Health center in 1986 are given as follows. Construct ungrouped frequency
distribution
30 25 23 41 39 27 41 24 32 29 29 35 31 36 33 36 42
35 37 41
Age(xj) 23 24 25 27 29 30 31 32 33 35 36 37 39 41 42
Tally / / / / // / / / / // // / / /// /
Frequency(f) 1 1 1 1 2 1 1 1 1 2 2 1 1 3 1
35
Methods of Data Presentation …
the data must be grouped in which each class has more than one unit in
width.
We use when the range of the data is large, and for data from continuous
variable.
36
Methods of Data Presentation …
Guidelines for classes
There should be 5 to 20 classes. Determine using Sturge’s rule
K 1 3.32 log n
Classes should be continuous.
Range R
W
Number of classes K
37
Methods of Data Presentation …
Class limit (CL)
It separates one class from another.
have gaps between the upper limits of one class and the lower limit of the next class.
Class boundary(CB)
Separate one class in a grouped frequency distribution from the other.
The boundary has one more decimal place than the raw data.
There is no gap between the upper boundaries of one class and the lower boundaries
38
Methods of Data Presentation …
Unit of measurement (U)
This is the possible difference between successive values. E.g. 1, 0.1, 0.01 …
The class width is also the difference between the lower limit or upper limits of two
consecutive classes.
sum by two.
39
Methods of Data Presentation …
Steps to construct grouped frequency distribution
Find smallest (S) and largest (L) values in your data
Take the smallest value as the first class lower class limit, and add class width to get consecutive
lower class limits
To get upper class limit subtract unit of measurement from second class lower class limit, and add
class width to get remaining upper class limits
Subtract half of unit of measurement from lower class limit to get class boundary, and add half of
unit of measurement to upper class limit to get upper class boundary
Tally data
40
Methods of Data Presentation …
Example: Age in year of 20 women who attended health education at Jimma
Health center in 1986 are given as follows. Construct grouped frequency
distribution
30 25 23 41 39 27 41 24 32 29 29 35 31 36 33 36 42
35 37 41
n=20
k=1+3.322(log20) =1+3.322(1.3010) = 5.196 k=6
w= (42-23)/6 =4
The grouped frequency table using Sturges formula
41
Consider the following data
30 40 41 33 70 51 37 10 31 21 60 44 63 72 23 37 65
14 25 28 64 39 17 74 53 34 51 27 43 45 33 16 23 68
47 32 36 19 48 49 67 60 45 54 44 30 15 38 22 46 61
25 29 55 48 49 35 13 37 36
Prepare i) absolute frequency distribution;
ii) relative frequency distribution;
iii) less than and more than cumulative
frequency distributions.
R = 74 – 10 = 64 , n = 60
Using Sturges’ Rule:
K=1+3.322(log10 60) = K=1+3.322( 1.778151 ) =
6.9070 7
W = 64/ 7 = 9.14 10
Class Frequency RF LCF MCF
10-19 7 0.116 7 60
20-29 9 0.15 16 53
30-39 15 0.25 31 44
40-49 13 0.216 44 29
50-59 5 0.083 49 16
60-69 8 0.133 57 11
70 - 79 3 0.05 60 3
Total 60 1.00
Methods of Data Presentation …
Diagrammatic and Graphic presentation of the data
One of the most effective and interesting alternative way in which a
There are several ways in which statistical data may be displayed pictorially
Bar chart
Histogram
45
Methods of Data Presentation …
Pie Chart
Pie chart is a circular diagram and the area of the sector of a circle is used in
pie chart.
Component part
Angle of sec tor 3600
Total
These angles are made in the circle by mean of a protractor to show different
components.
46
Methods of Data Presentation …
Pie Chart (Example)
The following table gives the details of quarterly sale of a Sport Wear
Month Profit($,000,000)
1st quarter 100
2nd quarter 300
3rd quarter 500
4th quarter 600
Total 1500
Construct a pie chart
47
Methods of Data Presentation …
Pie Chart (Example)
Quarter Angle of sector Percen
Profit($,000,000)
(in degrees) t (%)
1st quarter
7%
2nd quarter
33%
48
Methods of Data Presentation …
Bar Chart
Use vertical or horizontal bins to represent the frequencies of a distribution.
While we draw bar chart, we have to consider the following two points.
Make the units on the axis that are used for the frequency equal in size
49
Methods of Data Presentation …
Simple Bar Chart
Used to represents data involving only one variable classified on spatial,
50
Methods of Data Presentation …
Multiple Bar Chart
When two or more interrelated series of data are depicted by a bar diagram
Example: Suppose we have export and import (in million) figures for a
80
60
40 Export
20 Import
0
2010 2011 2012
51
Methods of Data Presentation …
Stratified/Stacked Bar Chart
used to represent data in which the total magnitude is divided into
different or components.
First make simple bars for each class taking total magnitude in that class
and then divide these simple bars into parts in the ratio of various
components
52
Methods of Data Presentation …
Stratified/Stacked Bar Chart
The table below shows the profit of a company ($ Millions) from different
item sales in 1st quarter of the year. Draw stratified/stacked bar chart
Company Shoe T-shirt Ball Total
X 30 50 40 120
Y 33 16 27 76
Z 37 13 37 87
140 Ball
120 T-shirt
Sales in $,000,000
100 40 Shoe
80
37
60 27
50
40 16 13
20 30 33 37
0
X Y Z
Company 53
Methods of Data Presentation …
Deviation Bar Chart
Used when the data contains both positive and negative values such as data
commodity.
Commodity Net profit
Soap 80
Sugar -95 Net profit
Coffee 125 150
100
50 Soap
0 Sugar
Soap Sugar Coffee
-50 Coffee
-100
-150
54
Methods of Data Presentation …
Histogram
Histogram is a special type of bar graph in which the horizontal scale
represents classes of data values and the vertical scale represents frequencies.
The height of the bars correspond to the frequency values, and the drawn
represent frequencies.
55
Methods of Data Presentation …
Histogram
A histogram shows the shape of continuous data, checks for homogeneity, and
To construct a histogram, we split the range of data into equal intervals, “bins,”
56