CH 4
CH 4
CH 4
Measures of
Central Tendency
11/14/2023 1
Cont’
• Measures of central tendency describe a
distribution near its center.
• They provide indications on middle values or
most likely or most frequent values.
• In other words, they tell us where the center of
the distribution of the data is located.
The Summation Notation
Summation or sigma notation is a convenient
and simple form of shorthand used to give a
concise expression for a sum of the values of a
variable.
11/14/2023 2
Cont’
• In statistics, the symbol ∑ (Greek letter sigma) means to add
or find the sum.
• For example, means to add the numbers represented by
the variable X.
• Thus, if X represents 5,2,8,4, and 6, then
=5+2+8+4+6=25.
• Sometimes a subscript notation is used, such as:
11/14/2023 4
Cont’
• Thus, the sum of all elements of X except the
first and the last would be indicated as:
• which would be read as the sum of X with i
going from 2 to n-1.
• Some formulas require that each number be
squared before the numbers are summed.
• This is indicated by: means to square
each value before summing.
11/14/2023 5
Cont’
• It is very important to note that it makes a big
difference whether the numbers are squared
first and then summed or summed first and then
squared.
• The symbol (ΣX)² indicates that the numbers
should be summed first and then squared.
• For the above example, this equals: (5
+ 2 + 8 + 4+6)² = 25² = 625.
• This, of course, is quite different from 135.
• Sometimes a formula requires that the sum of
cross products be computed.
11/14/2023 6
Cont’
• For instance, given
X Y
2 3
1 6
4 5
5 0 0
2 -3 9
8 3 9
4 -1 1
6 1 1
20
11/14/2023 8
Basic properties of summation notation
X Y
3 8
2 3
4 1
Σ(X + Y) = 11 + 5 + 5 = 21
ΣX = 3 + 2 + 4 = 9
ΣY = 8 + 3 + 1 = 12
ΣX + ΣY = 9 + 12 = 21
(ΣX) (ΣY)≠ΣXY In the above example :( ΣX) (ΣY) = 9 *12 = 108
ΣXY=3*8+2*3+4*1=34. Thus, 108 ≠34
11/14/2023 9
CONT’
• In the above example, =9+4+16=29
• =9*9=81. Thus, 29≠81
• For any constant c,
Example:
11/14/2023 10
Types of measures of central tendency
• Arithmetic Mean: is the sum of the data set
values divided by the number of observations.
• Arithmetic mean or average value of a variable is
the most important numerical measures of
central tendency.
• For ungrouped data, the population mean
(usually denoted by “ ”) is the sum of all the
population values divided by the total number of
population values:
11/14/2023 11
Cont’
=Sample mean
n=number of element in the
sample/sample size
11/14/2023 12
Cont’
• A sample of five executives received the
following salaries (Birr in thousands): 14.0, 15.0,
17.0, 16.0, and 15.0, find the mean salary.
11/14/2023 13
Properties of Arithmetic mean
Arithmetic mean is the most widely used measure of
location/central tendency.
All the values are included in computing the mean.
A set of data has a unique mean.
Every set of quantitative data has a mean.
The mean is affected by large or small data values,
called outliers and may not be the appropriate average
to use in this situations.
We cannot determine a mean for open ended data.
The sum of the deviations of each value from the mean
is always zero.
11/14/2023 14
Cont’
Example given Xi=5,2,4,8,6
Example : The mean age of 12 men and 10 women are 45 and 42 respectively.
What is the combined mean age?
11/14/2023 15
Cont’
• The arithmetic mean is affected by both change
of origin and scale.
• That is, Given a mean for data values, if we add
or subtract a constant number c from all data
values, the new mean will be the old mean plus
or minus c (change of origin).
• Given a mean for data values, if we multiply all
data values by a constant number c, then the
new mean will be c times the old one (change of
scale).
11/14/2023 16
Cont’
• Example: The mean life of a certain brand of
bulbs is 1030 hours.
• If a new process adds 50 hour to the life of each
bulb, what will be the mean life of them? (ans.
1080 hours ).
• If you apply a recently developed method of
production, the life of each bulb is doubled,
what will happen to the mean life of them?
(ans. 2060 hours ).
11/14/2023 17
Arithmetic mean for grouped data
• The mean of a sample of data organized in a
frequency distribution is computed by the
following formula:
•
11/14/2023 19
Weighted mean
• It is a special case of arithmetic mean.
• It occurs when there are several observations of
the same value which might occur if the data have
been grouped in to a frequency distribution.
• It is the mean value of data values that have been
weighted according to their relative importance.
• The formula for the weighted mean for a
population or a sample will be as follows:
11/14/2023 20
Cont’
=population weighted mean
is sample weighted mean
11/14/2023 21
Geometric mean
• The geometric mean (GM) of n positive numbers
is defined as the root of their product.
The formula is:
11/14/2023 22
Cont’
• Examples :The GM of 4 and 16 is
The GM of 1,3,9 is
•The interest rates on three bonds were 5, 21, and 4
percent. The average interest rate is:
11/14/2023 24
Cont’
• If the population of Ethiopia increased from
53,000,000 in 1980 to 73,000,000 in 2000.
What is the average annual increase?
• GM= -1=0.016=1.6%
11/14/2023 25
Cont’
• Example: Find the geometric mean for the
following grouped data on the percentage
increase in salary of 16 employees of a
company.
%increase in salary Number of employees Class mark
0-4 5 2
5-9 6 7
10-14 3 12
15-19 2 17
11/14/2023 26
Cont’
• Solution:
•The geometric mean percentage increase in salary is 5.85%.
If 'n' is a large number, the computing the nth root of the
product is a tedious work.
•To facilitate the computation of GM, we make use of logarithms.
11/14/2023 27
Cont’
11/14/2023 29
• Remark: The Harmonic Mean is useful and appropriate
in finding average speeds and average rates.
• Example: Suppose a person drove 100kms at 40km/hr
and returned driving at 50km/hr. What is the average
speed?
• Solution:
•
11/14/2023 30
Cont’
Arithmetic(weighted)mean==44.44km/hr
11/14/2023 31
Cont’
• Here, we don't calculate the arithmetic mean to
find the average speed because the man
traveled equal distances by different speed on
three days.
• If, however, he had traveled for equal times in 3
days the arithmetic mean would be had correct
average.
• If we want to use arithmetic mean, we have to
take weights in to account.
11/14/2023 32
• Harmonic mean for grouped data
0-4 5 2
5-9 6 7
10-14 3 12
15-19 2 17
11/14/2023 33
Cont’
• Solution:
11/14/2023 37
Cont’
Where
• =is the lower class boundary/class limit of the
median class
• n =is total number of observations
• cf= is the cumulative frequency preceding the
median class
• i =is the class interval/width
• f= is frequency of the median class
Remark: The median class is the class with the
smallest cumulative frequency (less than type)
greater than or equal to n/2.
11/14/2023 38
Cont’
11/14/2023 39
Solution:
c.
11/14/2023 40
Cont’
Properties of Median
Array is a must before we calculate the median.
There is a unique median for each data set.
Geometrically, median divides the histogram or cumulative
frequency curves into two parts with equal area.
Median remains unaffected by the magnitude of the
extreme values.
It can be calculated for an open ended frequency
distribution if the median class doesn't lie in an open
ended class.
11/14/2023 41
Mode (MO) x̂
Mode is the most frequent value in a data set.
The mode is the value of the observation that
appears most frequently.
The mode of the distribution is the value that has
the greatest concentration of tendencies,….
……i.e., the value that occurs with greatest number
of times in a distribution.
The data value that occurs with greatest
frequency is a mode.
11/14/2023 42
Cont’
• Example: the examination scores for ten students
are: 81, 93,84,75,68,87,81,75, 81and 87. Because
the score of 81 occurs three times, it is the mode.
• A data set may have
No mode at all, e.g. 1, 3, 9, 0, 7, 8
One mode (unimodal) e.g. 1, 3, 1, 7, 1, 9, mode is
1
Two modes (bimodal) e.g. 7,2,4,4,7 , mode are 7
and 4
Many modes (multimodal) e.g. 1, 0, 0, 1, 3, 2, 2, 3,
7, 7, 4, 9, mode are 1, 0, 3, 2, 7
11/14/2023 43
Mode of a grouped data
• The approximate modal value grouped data is
calculated by the following formula:
Where:
Lo is the lower class boundary of the modal class
(i.e class with highest frequency).
f =is the frequency of modal class.
f1= is frequency of the class immediately preceding the
modal class.
f2 =is frequency of the class immediately following the
modal class.
i =is the class interval
11/14/2023 44
Cont’
• Note: the data is to be arranged in an array.
Example: Find the mode of the following
distribution:
Class limit Frequency
90-100 10
100-110 37
110-120 65
120-130 80
130-140 51
140-150 35
150-160 18
160-170 4
11/14/2023 45
Cont’
• Solution:
• The 4th class is the modal class with f=80.
• Lo=120,f=80,f1=65,f2=51 &i=10
11/14/2023 46
Properties of mode
It is the easiest average to compute.
It can be obtained for both qualitative and quantitative
data.
It is not affected by extreme values.
The mode may not exist for a data set.
It is not unique. A data set can have more than one
mode.
The mode is not based on all observations.
Note: in the case of symmetrical distribution ,
mean= median= mode , however non symmetrical
distribution , mean and mode lie on the two ends and the
median
11/14/2023
lies b/n them. Mean – mode = 3(mean –median)47
Distribution, shape and measures of central
tendency
• The relative values of the mean, median and
mode are very much dependent on the
shape of the distribution for the data they
are describing.
• The data distributions may be described in
terms of symmetry and skewness.
• In other words, data can be either symmetric
or skewed depending on how the data are
distributed around the center.
11/14/2023 48
Symmetry (normal, bell shaped)
distribution:
• occurs when the data values are evenly distributed
around the center.
• In a symmetrical distribution, the left and right sides of
the distribution are mirror images of each other, and
the values of the mean, median and mode are equal.
• Skewed distribution: occurs when the data values
are not evenly distributed around the center.
• Skewness refers to the tendency of the distribution to
“tail off” to the right or left.
• Skewness is lack of symmetry of a distribution.
11/14/2023 49
11/14/2023 50
Right (positively) skewed distribution:
• The mean is greater than the median, which in
turn is greater than the mode.
• In such distributions, the median tend to be a
better measure of central tendency than the mean.
• In a positively skewed distribution (when the majority
of the data values fall to the left of the mean).
• The arithmetic mean is the largest of the three
measures as the mean is influenced by a few
extremely high values more than the Median or
Mode. Mode<Median<Mean
11/14/2023 51
Left (negatively) skewed distribution:
• The mean is less than the median, which in turn
is less than the mode.
• As with the positively skewed distribution, the
median is less influenced by extreme values …
and tends to be a better measure of central
tendency than the mean.
• Mean<Median<Mode
11/14/2023 52
Quartiles, Deciles and Percentiles
• Descriptive measures that describe the position
(place) of value in a given data or distribution are
positional averages.
• Measures which divided data in to many equal
parts are called quantiles (fractiles).
• The most important of these are quartiles,
deciles and percentiles.
• To obtain such measures, first of all, we have to
order the data in an increasing order.
11/14/2023 53
Quartiles
• Quartiles divide the data in to four equal parts.
The jth quartile denoted as Qj where j=1, 2, 3 is
defined as
Q1 gives the value where 25% of the observations lie
below and 75% above it.(Q1 - The lower or first quartile)
Q2 gives the value where 50% of the observations lie
below and 50% above it.(Q2 - The middle or second
quartile)
Q3 gives the value where 75% of the observations lie
below and 25% above it.(Q3 - The upper or third quartile)
11/14/2023 54
Cont’
• Example: Find the quartiles (Q1, Q2, & Q3)
from the following distribution 8, 4, 8, 3, 4, 8,
5, 5, 10,
• Solution: Arrange first: 3,4,4,5,5,8,8,8,10
11/14/2023 55
Cont’
wi=class width
fi=frequency of the quartile class
n=total number of observations
cf=the cumulative frequency of the class preceding the quartile class
11/14/2023 56
Example
class boundaries Fi Cf
5.5-10.5 1 1
10.5-15.5 2 3
15.5-20.5 3 6
20.5-25.5 5 11
25.5-30.5 4 15
30.5-35.5 3 18
35.5-40.5 2 20
11/14/2023 57
The jth quartile class is the class with the smallest
cumulative frequency greater than or equal to j*n/4
• Q2?
11/14/2023 58
• Q3?
Deciles
•Deciles are measures that divide a distribution/data set in
to ten equal parts.
• The decile for a simple frequency distribution
(ungrouped data) denoted as Dj, where j=1, 2, 3.....9 is
•defined as
11/14/2023 59
Cont’
• D1 gives the value where 10% of the observations lie
below and 90% above it.(D1 = Covers 10% of the distribution)
• D2 gives the value where 20% of the observations lie
below and 80% above it.(D2 = Covers 20% of the distribution)
• D3 gives the value where 30% of the observations lie
below and 70% above it.
.
.
D9 gives the value where 90% of the observations lie
below and 10% above it(D9 = Covers 90% of the distribution)
11/14/2023 60
• For grouped data,
Percentiles:
• Percentiles are measures that divide the frequency distribution in to
hundred equal parts.
• The values of the variables corresponding to these divisions are
denoted P1, P2,.. P99 often called the first, the second,…, the ninety-
ninth percentile respectively.
• The jth percentile class is the class with the smallest cumulative
frequency greater than or equal to j*n/100
11/14/2023 61
Importance:
11/14/2023 62
• Example: For the data given below, compute the value
of Quartiles, D3, D7, P15 and P88 and interpret.
Marks Below 10 10 - 20 20 - 40 40 - 60 60 - 80 Above 80
No. of Students
10 15 25 30 14 6
Solution: Q1 – size of item = 25th item 10 – 20 quartile class
l = 10, c = 10, f = 15, c.f = 10
Mark of 25% of students is less than 20.
c æ 3n ö 20
Q3 = l + ç - c. f ÷ = 40 + ( 75 - 50 ) = 56.6. 33
f è 4 ø 30
3 t h
M a rr k o f o f s t u dd ee n t s iis b e l oo w
w 56.66
73 .33. .
11/14/2023 4 63
th
3N
D 3 – size of item = 30t h item 20 – 40 decile class
10
L = 2 0 , c = 2 0 , f = 2 5 , c.f = 2 5
c 3n 20
D3 l c. f 20 30 25 24
f 10 25
Mark of 30% of students is below 24.
th
7N
D 7 – size of item = 70t h item 40 – 60 decile class
10
L = 4 0 , c = 2 0 , f = 3 0 , c.f = 5 0
c 7n 20
D7 l c. f 40 70 50 53.33
f 10 30
Mark of 70% of students is below 53.33.
11/14/2023 64
th
15N
P1 5 – size of item = 15t h item 10 – 20 percentile class
100
L = 1 0 , c = 1 0 , f = 1 5 , c.f = 1 0
c 15n 10
P15 l c. f 10 15 10 13.3
f 100 15
Mark of 15% of students is below 13.3.
th
88N
P8 8 – size of item = 88t h item 60 – 80 percentile class
10
L = 6 0 , c = 2 0 , f = 1 4 , c.f = 8 0
c 88n 20
P88 l c. f 60 88 80 71.43
f 100 14
Mark of 88% of students is below 71.43.
11/14/2023 65
• Note that:
• i. Q1 = P25, Q2 = D5 = P50 = median Q3 = P75
ii. D1 = P10, D2 = P20, D3 = P30, … , D9 = P90.
11/14/2023 66