0% found this document useful (0 votes)
3 views

Biostat Lecture Four

Chapter Four discusses Descriptive Statistics, focusing on Measures of Central Tendency (MCT) which include the Arithmetic Mean, Median, and Mode. It outlines the characteristics of a good MCT, the properties of each measure, and the appropriate contexts for their use based on data distribution. The chapter also introduces Measures of Dispersion, emphasizing the importance of understanding data variability alongside central tendency.

Uploaded by

birukfirdut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Biostat Lecture Four

Chapter Four discusses Descriptive Statistics, focusing on Measures of Central Tendency (MCT) which include the Arithmetic Mean, Median, and Mode. It outlines the characteristics of a good MCT, the properties of each measure, and the appropriate contexts for their use based on data distribution. The chapter also introduces Measures of Dispersion, emphasizing the importance of understanding data variability alongside central tendency.

Uploaded by

birukfirdut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Chapter Four

Descriptive Statistics:

teshomedemis112@gmail.com 1
Measures of Central Tendency (MCT)
• A frequency distribution is a general picture of the
distribution of a variable .
• But, can’t indicate the average value and the
spread of the values .
• The tendency of the statistical data to get
concentrated at a certain value is called “central
tendency”
• The various methods of determining the point
about which the observations tend to concentrate
are called MCT.

teshomedemis112@gmail.com 2
Measures of Central Tendency (MCT)

•The objective of calculating MCT is to determine


a single figure which may be used to represent
the whole data set.

•In that sense it is an even more compact


description of the statistical data than the
frequency distribution.

•Since a MCT represents the entire data, it


facilitates comparison within one group or
between groups of data.
teshomedemis112@gmail.com 3
Characteristics of a good MCT
A MCT is good or satisfactory if it possesses the following characteristics.
1. It should be based on all the observations.
2. It should not be affected by the extreme values.
3. It should be as close to the maximum number of
values as possible .
4. It should have a definite value.
5. It should be capable of further algebraic treatment .
6. It should be stable with regard to sampling.

teshomedemis112@gmail.com 4
• The most common measures of central tendency include:
 Arithmetic Mean
 Median
 Mode
Others

teshomedemis112@gmail.com 5
1. Arithmetic Mean
A. Ungrouped Data
• The arithmetic mean is the "average" of the data
set and by far the most widely used measure of
central location and it is usually denoted by
• Is the sum of all the observations divided by the
total number of observations.

teshomedemis112@gmail.com 6
b)G ro
u pe d d
ata
I
n c alculatingthem e
anfr
o mgr
o up
eddata
,weass
u m
eth
ata
llvalu e
sfallingin
toa
par ticularc la
ssinte
rva
larelo
cate
d a
tth
em id
-po
into
fth
ein
ter
va l.I
tisc alc
ula
teda
s
f
o llo w:
k


mf
i=
1
i i
x= k


f
i=
1
i

w
he
re,
k =thenum be
rofclassinterv a
ls
th
m i=them id
-po
intofthei c la
ssinte
rva
l
fi=thefr
eq u
encyoftheithc lassin
ter
val

teshomedemis112@gmail.com 7
Example. Compute the mean age of 169 subjects from the
grouped data.

Mean = 5810.5/169 = 34.48 years

Class interval Mid-point (mi) Frequency (fi) mifi


10-19 14.5 4 58.0
20-29 24.5 66 1617.0
30-39 34.5 47 1621.5
40-49 44.5 36 1602.0
50-59 54.5 12 654.0
60-69 64.5 4 258.0
Total __ 169 5810.5

teshomedemis112@gmail.com 8
When the data are skewed, the mean is “dragged” in
the direction of the skewness .

• It is possible in extreme cases for all but one of the sample points
to be on one side of the arithmetic mean & in this case, the mean is
a poor measure of central location or does not reflect the center of
the sample.
teshomedemis112@gmail.com 9
Properties of the Arithmetic Mean.
• For a given set of data there is one and only one arithmetic
mean (uniqueness).
• Easy to calculate and understand (simple).
• Influenced by each and every value in a data set
• Greatly affected by the extreme values.
• In case of grouped data if any class interval is open,
arithmetic mean can not be calculated .

teshomedemis112@gmail.com 10
2. Median
a) Ungrouped data
• The median is the value which divides the data set into two equal
parts.
• If the number of values is odd, the median will be the
middle value when all values are arranged in order of
magnitude.
• When the number of observations is even, there is no
single middle value but two middle observations.
• In this case the median is the mean of these two middle
observations, when all observations have been arranged in
the order of their magnitude.

teshomedemis112@gmail.com 11
teshomedemis112@gmail.com 12
teshomedemis112@gmail.com 13
teshomedemis112@gmail.com 14
• The median is a better description (than the mean) of the
majority when the distribution is skewed .
• Example
– Data: 14, 89, 93, 95, 96
– Skewness is reflected in the outlying low value of 14
– The sample mean is 77.4
– The median is 93

teshomedemis112@gmail.com 15
b) Grouped data
• In calculating the median from grouped data, we
assume that the values within a class-interval are
evenly distributed through the interval.
• The first step is to locate the class interval in which
the median is located, using the following procedure.
• Find n/2 and see a class interval with a minimum cumulative
frequency which contains n/2.
• Then, use the following formula.

teshomedemis112@gmail.com 16
 n 
  Fc 
~
x = Lm   2 W
 fm 
 
 
where,
Lm = lower true class boundary of the interval containing the median
Fc = cumulative frequency of the interval just above the median
class
interval
fm = frequency of the interval containing the median
W= class interval width
n = total number of observations
teshomedemis112@gmail.com 17
Example. Compute the median age of 169
subjects from the grouped data.

n/2 = 169/2 = 84.5

Class interval Mid-point (mi) Frequency (fi) Cum. freq


10-19 14.5 4 4
20-29 24.5 66 70
30-39 34.5 47 117
40-49 44.5 36 153
50-59 54.5 12 165
60-69 64.5 4 169
Total 169

teshomedemis112@gmail.com 18
• n/2 = 84.5 = in the 3rd class interval
• Lower limit = 29.5, Upper limit = 39.5
• Frequency of the class = 47
• (n/2 – fc) = 84.5-70 = 14.5

• Median = 29.5 + (14.5/47)10 = 32.58 ≈ 33

teshomedemis112@gmail.com 19
Properties of the median
• There is only one median for a given set of data
(uniqueness)
• The median is easy to calculate
• Median is a positional average and hence it is
insensitive to very large or very small values .
• Median can be calculated even in the case of
open end intervals
• It is determined mainly by the middle points and
less sensitive to the remaining data points
(weakness).

teshomedemis112@gmail.com 20
3. Mode

• The mode is the most frequently occurring value among


all the observations in a set of data.
• It is not influenced by extreme values.
• It is possible to have more than one mode or no mode.
• It is not a good summary of the majority of the data.

teshomedemis112@gmail.com 21
3. Mode
Mode

teshomedemis112@gmail.com 22
a) Ungrouped data
• It is a value which occurs most frequently in a set of
values.
• If all the values are different there is no mode, on the
other hand, a set of values may have more than one
mode.

teshomedemis112@gmail.com 23
• Example
• Data are: 1, 2, 3, 4, 4, 4, 4, 5, 5, 6
• Mode is 4
• Example
• Data are: 1, 2, 2, 2, 3, 4, 5, 5, 5, 6, 6, 8
• There are two modes – 2 & 5
• This distribution is said to be “bi-modal”
• Example
• Data are: 2.62, 2.75, 2.76, 2.86, 3.05, 3.12
• No mode, since all the values are different

teshomedemis112@gmail.com 24
b) Grouped data
• To find the mode of grouped data, we usually refer to
the modal class, where the modal class is the class
interval with the highest frequency.
• If a single value for the mode of grouped data must
be specified, it is taken as the mid-point of the modal
class interval.

teshomedemis112@gmail.com 25
 
x̂ = L m 
 w f 2 
 0  
f f 2 
 
where
L - Lower boundary of the Modal class
f0 – The frequency of the class next below the modal
class in value
f2 – the frequency of the class next above the modal class
in value
w – length of the interval of the modal class

teshomedemis112@gmail.com 26
teshomedemis112@gmail.com 27
Properties of mode
 It is not affected by extreme values
 It can be calculated for distributions with open end
classes
 Often its value is not unique
 The main drawback of mode is that often it does not
exist

teshomedemis112@gmail.com 28
Which measure of central tendency is best with a
given set of data?

• Two factors are important in making this decisions:


– The scale of measurement (type of data)
– The shape of the distribution of the
observations

teshomedemis112@gmail.com 29
• The mean can be used for discrete and continuous data .
• The median is appropriate for discrete and continuous
data as well, but can also be used for ordinal data.
• The mode can be used for all types of data, but may be
especially useful for nominal and ordinal measurements .
• For discrete or continuous data, the “modal class” can be
used .

teshomedemis112@gmail.com 30
(a) Symmetric and unimodal distribution — Mean, median,
and mode should all be approximately the same .

Mean, Median & Mode

teshomedemis112@gmail.com 31
(b) Bimodal — Mean and median should be about the
same, but may take a value that is unlikely to occur; two
modes might be best

teshomedemis112@gmail.com 32
(c) Skewed to the right (positively skewed) —Mean is
sensitive to extreme values, so median might be more
appropriate

Mode

Median

Mean

teshomedemis112@gmail.com 33
(d) Skewed to the left (negatively skewed) — Same as (c)

Mode

Median

Mean

teshomedemis112@gmail.com 34
Measures of Dispersion

Consider the following two sets of data:

A: 177 193 195 209 226 Mean =


200

B: 192 197 200 202 209 Mean =


200
Two or more sets may have the same mean and/or median but they
may be quite different.

teshomedemis112@gmail.com 35
These two distributions have the same mean,
median, and mode

teshomedemis112@gmail.com 36
Measures of Dispersion
• MCT are not enough to give a clear
understanding about the distribution of the data.

• Measures that quantify the variation or


dispersion of a set of data from its central
location

• Dispersion refers to the variety exhibited by the


values of the data.

• The amount may be small when the values are


close together.

teshomedemis112@gmail.com 37
Measures of Dispersion
Other synonymous term:
– “Measure of Variation”
– “Measure of Spread”
– “Measures of Scatter”

teshomedemis112@gmail.com 38
• Measures of dispersion include:
– Range
– Variance
– Standard deviation
– Coefficient of variation
– Standard error
– Others

teshomedemis112@gmail.com 39
1. Range (R)
• The difference between the largest and smallest
observations in a sample.

• Range = Maximum value – Minimum value

• Example –
– Data values: 5, 9, 12, 16, 23, 34, 37, 42
– Range = 42-5 = 37
• Data set with higher range exhibit more variability

teshomedemis112@gmail.com 40
Properties of range
 It is the simplest crude measure and can be easily
understood
 It takes into account only two values which causes it to be
a poor measure of dispersion
 Very sensitive to extreme observations
 The larger the sample size, the larger the
range

teshomedemis112@gmail.com 41
2. Variance (2, s2)
• Variance is used to measure the dispersion of values
relative to the mean.
• The variance is the average of the squares of the
deviations taken from the mean.
• When values are close to their mean (narrow range) the
dispersion is less than when there is scattering over a
wide range.
– Population variance = σ2
– Sample variance = S2

teshomedemis112@gmail.com 42
Ungrouped data

teshomedemis112@gmail.com 43
Degrees of freedom
• In computing the variance there are (n-1) degrees of
freedom because only (n-1) of the deviations are
independent from each other .
• The last one can always be calculated from the others
automatically.

teshomedemis112@gmail.com 44
b) Grouped data
k

 (m i  x) 2 f i
S2  i =1
k

i =1
fi - 1

where
mi = the mid-point of the ith class interval
fi = the frequency of the ith class interval
k = the number of class intervals
x = the sample mean

teshomedemis112@gmail.com 45
Properties of Variance:
 The main disadvantage of variance is that its unit
is the square of the unite of the original
measurement values .
 The variance gives more weight to the extreme
values as compared to those which are near to
mean value, because the difference is squared in
variance.
• The drawbacks of variance are overcome by the
standard deviation.

teshomedemis112@gmail.com 46
4. Standard deviation (, s)
• It is the square root of the variance.
• This produces a measure having the same scale as
that of the individual values.

   and S = S 2 2

teshomedemis112@gmail.com 47
teshomedemis112@gmail.com 48
Example. Compute the variance and SD of the age of 169
subjects from the grouped data.
Mean = 5810.5/169 = 34.48 years
S2 = 20199.22/169-1 = 120.23
SD = √S2 = √120.23 = 10.96
Class
interval (mi) (fi) (mi-Mean) (mi-Mean)2 (mi-Mean)2 fi
10-19 14.5 4 -19.98 399.20 1596.80
20-29 24.5 66 -9-98 99.60 6573.60
30-39 34.5 47 0.02 0.0004 0.0188
40-49 44.5 36 10.02 100.40 3614.40
50-59 54.5 12 20.02 400.80 4809.60
60-69 64.5 4 30.02 901.20 3604.80
Total 169 1901.20 20199.22

teshomedemis112@gmail.com 49
Properties of SD
• The SD has the advantage of being expressed in
the same units of measurement as the mean

• SD is considered to be the best measure of


dispersion and is used widely because of the
properties of the theoretical normal curve.

• However, if the units of measurements of variables


of two data sets is not the same, then there
variability can’t be compared by comparing the
values of SD.
teshomedemis112@gmail.com 50
SD vs Standard Error (SE)
• SD describes the variability among individual
values in a given data set .
• SE is used to describe the variability among
separate sample means obtained from one
sample to another .

teshomedemis112@gmail.com 51
5. Coefficient of variation (CV)
• When two data sets have different units of
measurements, or their means differ sufficiently in
size, the CV should be used as a measure of
dispersion.
• It is the best measure to compare the variability of
two series of sets of observations.
• Data with less coefficient of variation is considered
more consistent.

• (CV) = (Standard Deviation/Mean) × 100.

teshomedemis112@gmail.com 52
teshomedemis112@gmail.com 53

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy