CH 4

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 66

CHAPTER FOUR

Measures of
Central Tendency

11/14/2023 1
Cont’
• Measures of central tendency describe a
distribution near its center.
• They provide indications on middle values or
most likely or most frequent values.
• In other words, they tell us where the center of
the distribution of the data is located.
The Summation Notation
 Summation or sigma notation is a convenient
and simple form of shorthand used to give a
concise expression for a sum of the values of a
variable.
11/14/2023 2
Cont’
• In statistics, the symbol ∑ (Greek letter sigma) means to add
or find the sum.
• For example, means to add the numbers represented by
the variable X.
• Thus, if X represents 5,2,8,4, and 6, then
=5+2+8+4+6=25.
• Sometimes a subscript notation is used, such as:

• This notation means to find the sum of five numbers


represented by X.
• This notation is read as follows: sum the values of X I from X1
through X5.
11/14/2023 3
Cont’
=x1 +x2 +x3+x4+x5 Generally,

n= is upper limit/ stopping point of summation


Xi is typical element
i is index of summation
1 is starting point /lower limit of summation
∑ is summation sign
• In order to make formulas more general, variables can be used with the
summation notation.
•For example, means to sum up values of X from 1 to n where n can
be any number.
• Often an abbreviated form of the summation notation is used.
• For example, ΣX means to sum all the values of X.
•When only subsets of the values of X are to be summed then the full version is
required.

11/14/2023 4
Cont’
• Thus, the sum of all elements of X except the
first and the last would be indicated as:
• which would be read as the sum of X with i
going from 2 to n-1.
• Some formulas require that each number be
squared before the numbers are summed.
• This is indicated by: means to square
each value before summing.

11/14/2023 5
Cont’
• It is very important to note that it makes a big
difference whether the numbers are squared
first and then summed or summed first and then
squared.
• The symbol (ΣX)² indicates that the numbers
should be summed first and then squared.
• For the above example, this equals: (5
+ 2 + 8 + 4+6)² = 25² = 625.
• This, of course, is quite different from 135.
• Sometimes a formula requires that the sum of
cross products be computed.
11/14/2023 6
Cont’
• For instance, given
X Y
2 3
1 6
4 5

What is ΣXY? The sum of cross products (2 x 3) + (1 x 6) + (4 x 5) = 32


• The notation means perform the following
steps:
• find the mean
• Subtract the mean from each value
• Square the answers and find the sum.
11/14/2023 7
Cont’
• Example: Find the value of for the values
5, 2,8,4,6.
X

5 0 0
2 -3 9
8 3 9
4 -1 1
6 1 1
20

11/14/2023 8
Basic properties of summation notation
X Y

3 8

2 3

4 1

Σ(X + Y) = 11 + 5 + 5 = 21
ΣX = 3 + 2 + 4 = 9
ΣY = 8 + 3 + 1 = 12
ΣX + ΣY = 9 + 12 = 21
(ΣX) (ΣY)≠ΣXY In the above example :( ΣX) (ΣY) = 9 *12 = 108
ΣXY=3*8+2*3+4*1=34. Thus, 108 ≠34

11/14/2023 9
CONT’
• In the above example, =9+4+16=29
• =9*9=81. Thus, 29≠81
• For any constant c,
Example:

11/14/2023 10
Types of measures of central tendency
• Arithmetic Mean: is the sum of the data set
values divided by the number of observations.
• Arithmetic mean or average value of a variable is
the most important numerical measures of
central tendency.
• For ungrouped data, the population mean
(usually denoted by “ ”) is the sum of all the
population values divided by the total number of
population values:

11/14/2023 11
Cont’

• Where N=the total population


=population mean
•The population mean applies when the data represent all of the
items within the population.
• For ungrouped data, the sample mean is the sum of all the sample
values divided by the number of sample values:

=Sample mean
n=number of element in the
sample/sample size
11/14/2023 12
Cont’
• A sample of five executives received the
following salaries (Birr in thousands): 14.0, 15.0,
17.0, 16.0, and 15.0, find the mean salary.

•Therefore, the mean salary of the executives is


Birr 15,400.00

11/14/2023 13
Properties of Arithmetic mean
 Arithmetic mean is the most widely used measure of
location/central tendency.
 All the values are included in computing the mean.
 A set of data has a unique mean.
 Every set of quantitative data has a mean.
 The mean is affected by large or small data values,
called outliers and may not be the appropriate average
to use in this situations.
 We cannot determine a mean for open ended data.
 The sum of the deviations of each value from the mean
is always zero.
11/14/2023 14
Cont’
Example given Xi=5,2,4,8,6

If are the arithmetic mean of n1 and n2


observations respectively, then the combined mean will be :

Example : The mean age of 12 men and 10 women are 45 and 42 respectively.
What is the combined mean age?

11/14/2023 15
Cont’
• The arithmetic mean is affected by both change
of origin and scale.
• That is, Given a mean for data values, if we add
or subtract a constant number c from all data
values, the new mean will be the old mean plus
or minus c (change of origin).
• Given a mean for data values, if we multiply all
data values by a constant number c, then the
new mean will be c times the old one (change of
scale).
11/14/2023 16
Cont’
• Example: The mean life of a certain brand of
bulbs is 1030 hours.
• If a new process adds 50 hour to the life of each
bulb, what will be the mean life of them? (ans.
1080 hours ).
• If you apply a recently developed method of
production, the life of each bulb is doubled,
what will happen to the mean life of them?
(ans. 2060 hours ).

11/14/2023 17
Arithmetic mean for grouped data
• The mean of a sample of data organized in a
frequency distribution is computed by the
following formula:

Xi =class mark of the class


K= no. of class
Example: Compute the arithmetic mean of for the
following grouped data:
11/14/2023 18
Cont’
Class Class Fi Fi Xi
boundaries mark(xi)
5.5-10.5 8 1 8
10.5-15.5 13 2 26
15.5-20.5 18 3 54
20.5-25.5 23 5 115
25.5-30.5 28 4 112
30.5-35.5 33 3 99
35.5-40.5 38 2 76
=20 =490

11/14/2023 19
Weighted mean
• It is a special case of arithmetic mean.
• It occurs when there are several observations of
the same value which might occur if the data have
been grouped in to a frequency distribution.
• It is the mean value of data values that have been
weighted according to their relative importance.
• The formula for the weighted mean for a
population or a sample will be as follows:

11/14/2023 20
Cont’
=population weighted mean
is sample weighted mean

is weight assigned to the data value


xi is the data value

Examples :During a one hour period on Saturday


afternoon a waiter served fifty drinks. She sold 5
drinks for birr 0.50, 15 for birr 0.75, 15 for birr
0.90, and 15 for birr 1.10. Compute the weighted
mean price of the soft drinks.

11/14/2023 21
Geometric mean
• The geometric mean (GM) of n positive numbers
is defined as the root of their product.
The formula is:

•The geometric mean is useful in finding the average of


percents, ratios, indexes, or growth rates.
•It has a wide application in business and economics
because we are often interested in finding the percentage
changes in sales, revenues, profits, GDP, etc.

11/14/2023 22
Cont’
• Examples :The GM of 4 and 16 is
The GM of 1,3,9 is
•The interest rates on three bonds were 5, 21, and 4
percent. The average interest rate is:

Example: The returns on investment earned by a company


for four successive years were 30%, 20%, -40% & 200%,
what is the geometric rate of return on investment?
•Solution: 30% return means additional gain from what we
have (i.e. from 100%).
•Then 30% return is expressed as 1.3, -40% implies
reduction ( 1-0.4 = 0.6)
11/14/2023 23
• Another use of the geometric mean is to determine the
percent increase in sales , production or other business
or economic series from one time period to another.

Example :The production of soaps for a soap factory increased


from 755,000 in 1992 to 835,000 in 2000. What would be the rate
of production increase? Rate of production increase

11/14/2023 24
Cont’
• If the population of Ethiopia increased from
53,000,000 in 1980 to 73,000,000 in 2000.
What is the average annual increase?
• GM= -1=0.016=1.6%

For grouped data geometric mean is calculated as:


Where fi is the frequency of the
class mark,
Xi is class mark
m is number of values
n=total number of observations

11/14/2023 25
Cont’
• Example: Find the geometric mean for the
following grouped data on the percentage
increase in salary of 16 employees of a
company.
%increase in salary Number of employees Class mark

0-4 5 2

5-9 6 7

10-14 3 12

15-19 2 17

11/14/2023 26
Cont’
• Solution:
•The geometric mean percentage increase in salary is 5.85%.
If 'n' is a large number, the computing the nth root of the
product is a tedious work.
•To facilitate the computation of GM, we make use of logarithms.

11/14/2023 27
Cont’

• Harmonic Mean :The harmonic mean of n


positive observations is defined as the number
of values divided by the sum of the reciprocals
of each value.
11/14/2023 28
Cont’

• It is used for average rates of change.


• Example: Speed. Example: Find HM of 60, 50 &
40
• Solution:

11/14/2023 29
• Remark: The Harmonic Mean is useful and appropriate
in finding average speeds and average rates.
• Example: Suppose a person drove 100kms at 40km/hr
and returned driving at 50km/hr. What is the average
speed?
• Solution:


11/14/2023 30
Cont’

Arithmetic(weighted)mean==44.44km/hr

• This value can be found by using the harmonic


mean formula:

11/14/2023 31
Cont’
• Here, we don't calculate the arithmetic mean to
find the average speed because the man
traveled equal distances by different speed on
three days.
• If, however, he had traveled for equal times in 3
days the arithmetic mean would be had correct
average.
• If we want to use arithmetic mean, we have to
take weights in to account.

11/14/2023 32
• Harmonic mean for grouped data

%increase in salary Number of Class mark


employees

0-4 5 2
5-9 6 7
10-14 3 12
15-19 2 17

11/14/2023 33
Cont’

• Solution:

Relationship between Arithmetic mean,


Geometric Mean and Harmonic Mean
• For a set of data containing n-positively valued
observations, the following relationships always
holds:
• The three means become equal if all values in
the set of data are equal.
11/14/2023 34
Median (MD) ~x 
• The median of a set of values arranged in the
order of their magnitudes, i.e., in an array, is the
middle value or the arithmetic mean of two
middle values.
• Median is that value of a variable which divides
an array of items in such a manner that the
number of items below it is equal to the number
of items above it.
Median for Ungrouped Data
If the number of observations is odd, then,
11/14/2023 35
Cont’
• Example: Find the median of the following data
set: 1, 5, 3, 9, 10, 12, 6
• Solution: First array the data: 1, 3, 5, 6, 9, 10, 12,

•If the number of observations is even, then,

• Find the median of the following data set: 1, 5, 2, 9, 7,


10, 12, 13
11/14/2023 36
Cont’
• Solution: First array the data: 1, 2, 5, 7, 9, 12, 13,

Median for Grouped data


•For grouped data, median is calculated by using the
following formula:

11/14/2023 37
Cont’
Where
• =is the lower class boundary/class limit of the
median class
• n =is total number of observations
• cf= is the cumulative frequency preceding the
median class
• i =is the class interval/width
• f= is frequency of the median class
Remark: The median class is the class with the
smallest cumulative frequency (less than type)
greater than or equal to n/2.
11/14/2023 38
Cont’

• Example: find the median from the following


frequency distribution
Class limit Frequency Cumulative
frequency
30-40 2 2
40-50 18 20
50-60 24 44
60-70 20 64
70-80 8 72
80-90 3 75
Total=75

11/14/2023 39
Solution:

• Steps: a. Find the cumulative frequency


b.

c.

d. In which class does the 38th observation fall? In the 3rd


class and thus the 3rd class is the median class.
e. Find the cumulative frequency preceding the median
class. 20 in this case.
f. Find the class width. 10 in this case.
g. Find the frequency of the median class. 24 in this case.

11/14/2023 40
Cont’

Properties of Median
 Array is a must before we calculate the median.
 There is a unique median for each data set.
 Geometrically, median divides the histogram or cumulative
frequency curves into two parts with equal area.
 Median remains unaffected by the magnitude of the
extreme values.
 It can be calculated for an open ended frequency
distribution if the median class doesn't lie in an open
ended class.
11/14/2023 41
Mode (MO) x̂ 
 Mode is the most frequent value in a data set.
 The mode is the value of the observation that
appears most frequently.
 The mode of the distribution is the value that has
the greatest concentration of tendencies,….
……i.e., the value that occurs with greatest number
of times in a distribution.
 The data value that occurs with greatest
frequency is a mode.

11/14/2023 42
Cont’
• Example: the examination scores for ten students
are: 81, 93,84,75,68,87,81,75, 81and 87. Because
the score of 81 occurs three times, it is the mode.
• A data set may have
 No mode at all, e.g. 1, 3, 9, 0, 7, 8
 One mode (unimodal) e.g. 1, 3, 1, 7, 1, 9, mode is
1
 Two modes (bimodal) e.g. 7,2,4,4,7 , mode are 7
and 4
 Many modes (multimodal) e.g. 1, 0, 0, 1, 3, 2, 2, 3,
7, 7, 4, 9, mode are 1, 0, 3, 2, 7
11/14/2023 43
Mode of a grouped data
• The approximate modal value grouped data is
calculated by the following formula:

Where:
Lo is the lower class boundary of the modal class
(i.e class with highest frequency).
f =is the frequency of modal class.
f1= is frequency of the class immediately preceding the
modal class.
f2 =is frequency of the class immediately following the
modal class.
i =is the class interval
11/14/2023 44
Cont’
• Note: the data is to be arranged in an array.
Example: Find the mode of the following
distribution:
Class limit Frequency
90-100 10
100-110 37
110-120 65
120-130 80
130-140 51
140-150 35
150-160 18
160-170 4
11/14/2023 45
Cont’
• Solution:
• The 4th class is the modal class with f=80.
• Lo=120,f=80,f1=65,f2=51 &i=10

11/14/2023 46
Properties of mode
 It is the easiest average to compute.
 It can be obtained for both qualitative and quantitative
data.
 It is not affected by extreme values.
 The mode may not exist for a data set.
 It is not unique. A data set can have more than one
mode.
 The mode is not based on all observations.
 Note: in the case of symmetrical distribution ,
mean= median= mode , however non symmetrical
distribution , mean and mode lie on the two ends and the
median
11/14/2023
lies b/n them. Mean – mode = 3(mean –median)47
Distribution, shape and measures of central
tendency
• The relative values of the mean, median and
mode are very much dependent on the
shape of the distribution for the data they
are describing.
• The data distributions may be described in
terms of symmetry and skewness.
• In other words, data can be either symmetric
or skewed depending on how the data are
distributed around the center.
11/14/2023 48
Symmetry (normal, bell shaped)
distribution:
• occurs when the data values are evenly distributed
around the center.
• In a symmetrical distribution, the left and right sides of
the distribution are mirror images of each other, and
the values of the mean, median and mode are equal.
• Skewed distribution: occurs when the data values
are not evenly distributed around the center.
• Skewness refers to the tendency of the distribution to
“tail off” to the right or left.
• Skewness is lack of symmetry of a distribution.
11/14/2023 49
11/14/2023 50
Right (positively) skewed distribution:
• The mean is greater than the median, which in
turn is greater than the mode.
• In such distributions, the median tend to be a
better measure of central tendency than the mean.
• In a positively skewed distribution (when the majority
of the data values fall to the left of the mean).
• The arithmetic mean is the largest of the three
measures as the mean is influenced by a few
extremely high values more than the Median or
Mode. Mode<Median<Mean
11/14/2023 51
Left (negatively) skewed distribution:
• The mean is less than the median, which in turn
is less than the mode.
• As with the positively skewed distribution, the
median is less influenced by extreme values …
and tends to be a better measure of central
tendency than the mean.
• Mean<Median<Mode

11/14/2023 52
Quartiles, Deciles and Percentiles
• Descriptive measures that describe the position
(place) of value in a given data or distribution are
positional averages.
• Measures which divided data in to many equal
parts are called quantiles (fractiles).
• The most important of these are quartiles,
deciles and percentiles.
• To obtain such measures, first of all, we have to
order the data in an increasing order.
11/14/2023 53
Quartiles
• Quartiles divide the data in to four equal parts.
The jth quartile denoted as Qj where j=1, 2, 3 is
defined as
 Q1 gives the value where 25% of the observations lie
below and 75% above it.(Q1 - The lower or first quartile)
 Q2 gives the value where 50% of the observations lie
below and 50% above it.(Q2 - The middle or second
quartile)
 Q3 gives the value where 75% of the observations lie
below and 25% above it.(Q3 - The upper or third quartile)
11/14/2023 54
Cont’
• Example: Find the quartiles (Q1, Q2, & Q3)
from the following distribution 8, 4, 8, 3, 4, 8,
5, 5, 10,
• Solution: Arrange first: 3,4,4,5,5,8,8,8,10

11/14/2023 55
Cont’

• For grouped data,


Where i=1, 2,3
= lower class boundary of the quartile class (the class which contains the

wi=class width
fi=frequency of the quartile class
n=total number of observations
cf=the cumulative frequency of the class preceding the quartile class
11/14/2023 56
Example
class boundaries Fi Cf
5.5-10.5 1 1
10.5-15.5 2 3
15.5-20.5 3 6
20.5-25.5 5 11
25.5-30.5 4 15
30.5-35.5 3 18
35.5-40.5 2 20

11/14/2023 57
The jth quartile class is the class with the smallest
cumulative frequency greater than or equal to j*n/4

• Q2?

11/14/2023 58
• Q3?

Deciles
•Deciles are measures that divide a distribution/data set in
to ten equal parts.
• The decile for a simple frequency distribution
(ungrouped data) denoted as Dj, where j=1, 2, 3.....9 is
•defined as

11/14/2023 59
Cont’
• D1 gives the value where 10% of the observations lie
below and 90% above it.(D1 = Covers 10% of the distribution)
• D2 gives the value where 20% of the observations lie
below and 80% above it.(D2 = Covers 20% of the distribution)
• D3 gives the value where 30% of the observations lie
below and 70% above it.
.
.
D9 gives the value where 90% of the observations lie
below and 10% above it(D9 = Covers 90% of the distribution)

11/14/2023 60
• For grouped data,

• Where i=1, 2,3,4.....9


= lower class boundary of the ith decile class (the class which contains
the
The jth decile class is the class with the smallest cumulative
frequency greater than or equal to j*n/10

Percentiles:
• Percentiles are measures that divide the frequency distribution in to
hundred equal parts.
• The values of the variables corresponding to these divisions are
denoted P1, P2,.. P99 often called the first, the second,…, the ninety-
ninth percentile respectively.
• The jth percentile class is the class with the smallest cumulative
frequency greater than or equal to j*n/100
11/14/2023 61
Importance:

• The quartiles are more widely used in


Economics and Business while…..
• The deciles and percentiles are important in
Psychology and Educational Statistics
concerning grades, rates, ranks, etc.
• The working principle for computing the
partition value is basically the same as that of
computing the median.

11/14/2023 62
• Example: For the data given below, compute the value
of Quartiles, D3, D7, P15 and P88 and interpret.
Marks Below 10 10 - 20 20 - 40 40 - 60 60 - 80 Above 80
No. of Students
10 15 25 30 14 6
Solution: Q1 – size of item = 25th item 10 – 20 quartile class
l = 10, c = 10, f = 15, c.f = 10
Mark of 25% of students is less than 20.

Q2 – size of item = 50th item 20 – 40 quartile class


l = 20, c = 20, f = 25, c.f = 25
Mark of half of students is below 40.
th
3N
Q3 – size of item = 75t h item 40 – 60 quartile class
4
l = 4 0 , c = 2 0 , f = 3 0 , c.f = 5 0

c æ 3n ö 20
Q3 = l + ç - c. f ÷ = 40 + ( 75 - 50 ) = 56.6. 33
f è 4 ø 30

3 t h
M a rr k o f o f s t u dd ee n t s iis b e l oo w
w 56.66
73 .33. .
11/14/2023 4 63
th
3N
D 3 – size of item = 30t h item 20 – 40 decile class
10
L = 2 0 , c = 2 0 , f = 2 5 , c.f = 2 5
c  3n  20
D3  l    c. f   20  30  25  24
f  10  25
Mark of 30% of students is below 24.

th
7N
D 7 – size of item = 70t h item 40 – 60 decile class
10
L = 4 0 , c = 2 0 , f = 3 0 , c.f = 5 0
c  7n  20
D7  l    c. f   40  70  50  53.33
f  10  30
Mark of 70% of students is below 53.33.

11/14/2023 64
th
15N
P1 5 – size of item = 15t h item 10 – 20 percentile class
100
L = 1 0 , c = 1 0 , f = 1 5 , c.f = 1 0
c  15n  10
P15  l    c. f   10  15  10  13.3
f  100  15
Mark of 15% of students is below 13.3.
th
88N
P8 8 – size of item = 88t h item 60 – 80 percentile class
10
L = 6 0 , c = 2 0 , f = 1 4 , c.f = 8 0
c  88n  20
P88  l    c. f   60  88  80  71.43
f  100  14
Mark of 88% of students is below 71.43.
11/14/2023 65
• Note that:
• i. Q1 = P25, Q2 = D5 = P50 = median Q3 = P75
ii. D1 = P10, D2 = P20, D3 = P30, … , D9 = P90.

11/14/2023 66

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy