5.0 Summary Statistics (1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Mathematics in the Modern World

Math 100

Module 5: Measures of Central Tendency


Measures of Relative Standing
Measures of Variability
Prof. Ronald F. Judan
University of the Cordilleras
Baguio City, Philippines
• APK
• - A manager in a garment factory might ask his foremen the
number of garments produced daily by their firm for a period of
six months.
• The manager will not have the time to bother about 90
different figures ( 3 months=90 days). Thus, the foreman may
just state the average production per day which will be a single
figure.
• Although production may differ from day to day, a single figure
will suffice to characterize general production. Instead of going
into all details of a given distribution of items. It would be more
convenient to find out the number that best characterizes this
distribution.
• ANK
• MEASURES OF CENTRAL TENDENCY

• Def. 1: A measure of central tendency is a single figure that is


considered typical or representative of all the values in the group.
This is the value around which the scores or observations in the
set tend to cluster.
• Def. 2: The mean is the most popularly used measure of central
tendency. This is usually referred to as the average and is
denoted by 𝑋 ( read: x bar ).
USES AND PROPERTIES OF THE MEASURES OF CENTRAL TENDENCY

1. MEAN (Average)
• One computes the mean by using all the values of the data.
• The mean is used in computing other statistics, such as the variance.
• The mean for the data set is unique and not necessarily one of the data values.
• The mean if affected by extremely high or low values, and may not be
appropriate average to use in these situations.
• The mean is an appropriate measure of central tendency for interval and ratio
variables, hence it is also known as an interval statistic.

• `
2. MEDIAN
The median is used when one must find the center or middle
value of a data set.
The median is used one must determine whether the data
falls into the upper half or lower half of the distribution.

The median is affected less than the mean by extremely high


or extremely low values.

The median is used for ordinal or ranked measurements;


hence it is also called as ordinal statistic.
3. MODE
The mode is used when the most typical case is desired.

The mode is the easiest average to compute.

The mode can be used when the data are nominal such as
gender, political affiliations, or religious preferences.
COMPUTATIONS OF THE MEAN FOR UNGROUPED DATA
The simple arithmetic mean or simple mean consists of dividing the sum of values by
the number of values in the group.
Sample mean ( 𝑿 )
𝑥
• 𝑋=
𝑛
Where :
• 𝑋 is the sample mean
• x are the values in a given sample
∑ X is the sum or total of all the values or scores in the sample
n is the total no. of scores or values given/no of samples
N is the population

𝑥
• For Population mean we use µ=
𝑁
Example 2:
• Find the mean of the following values that shows the scores
of 10 students in Math Ed 503 : 10, 5, 4, 3, 11, 7, 9, 12, 8, 6

10+5+4+3+11+7+9+12+8+6
• Sol’n: 𝑥= = 7.5
10
Therefore: the mean scores of 10 students in MathEd 503 is
7.5.
Example 3:What should be the score of the 11th student in problem 2
so that the average is 8 ?
• Sol’n: Let s be the score of the 11th student
𝑋 10+5+4+3+11+7+9+12+8+6+𝑺
•𝑋 = 8= ∶
𝑛 11

• 𝑠 = 88 − 75 = 𝟏𝟑
• Therefore: The score of the 11th student in the data set to have
an average of 8 should be 13.
Weighted Mean
• The weighted mean refers to the average of the means of all the
groups given. It is sometimes used when a ”typical” value is
required but you want to give greater weight to some
measurements than others.
• Arithmetic mean computed by considering relative
importance of each items is called weighted arithmetic mean. To
give due importance to each item under consideration, we assign
number called weight to each item in proportion to its relative
importance.
Weighted Mean
• Weighted Arithmetic Mean is computed by using following formula:
𝑊∗𝑋
• 𝑋𝑊 =
𝑊
• Where:
• 𝑋𝑊 Stands for weighted arithmetic mean.
X Stands for values of the items and
W Stands for weight of the item
• ΣXw is the sum of the products of X and w
• Σw is the sum of the weights
Weighted Mean
Example 1:
A student obtained 40, 50, 60, 80, and 45 marks in the subjects of Math,
Statistics, Physics, Chemistry and Biology respectively. Assuming weights 5,
2, 4, 3, and 1 respectively for the above mentioned subjects. Find
Weighted Arithmetic Mean per subject.

Solution: Now we will find weighted


Marks Obtained Weight arithmetic mean as:
W*X
Subjects X W 𝑾 ∗ 𝑿 𝟖𝟐𝟓
Math 40 5 200 𝑿𝑾 = =
𝑾 𝟏𝟓
Statistics 50 2 100 = 𝟓𝟓 𝒎𝒂𝒓𝒌𝒔/𝒔𝒖𝒃𝒋𝒆𝒄𝒕
Physics 60 4 240
Chemistry 80 3 240
Biology 45 1 45
Total ΣW=15 ΣWX=825
Weighted Mean
• Example 2: A class of 25 students took a science test. 10 students had
an average (arithmetic mean) score of 80. The other students had an
average score of 60. What is the average score of the whole class?
• Solution:
• Step 1: To get the sum of weighted terms, multiply each average by
the number of students that had that average and then sum them up.
• 80 × 10 + 60 × 15 = 800 + 900 = 1700
• Step 2: Total number of terms = Total number of students = 25
• Step 3: Using the formula
𝑆𝑢𝑚 𝑜𝑓 𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑡𝑒𝑟𝑚𝑠 1700
• 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 = = = 68
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜 𝑜𝑓 𝑡𝑒𝑟𝑚𝑠 25
• Answer: The average score of the whole class is 68.
Def 4. Median (Mdn) is defined as the point in a distribution with 50 percent of
the measures or scores on each side of it; that is, the median is the midpoint of
a distribution.

COMPUTATION OF THE MEDIAN FOR UNGROUPED DATA


Note: When data are ungrouped, the individual scores or values should be arranged in an
ascending ( or descending ) order before the middle item is identified.
Formula:
N+1
Mdn = --------- th score or value; N is the no. of scores or values
2

odd-numbered set – when the number of observations is odd, the middle value is the
median.
even-numbered set – when the number of observations is even, the median is the mean or
average of the two middle scores.
Illustrative Examples:
1. What is the median amount spent on groceries when 7 customers spent the
following ( in dollars ) : 15, 18, 10, 4, 27, 5, 32?
• 4 5 10 15 18 27 32
Therefore: The median amount spent on groceries among 7 customers is $15
2. Suppose we add one more customer who spent 25 dollars on groceries in the
above example. What will be the median amount?
4 5 10 15 18 25 27 32 (15+18)/2= 16.5
Therefore: The median amount spent on groceries among 8 customers is $16.5
3. How about the following series of scores below? Find the median.
a. 1 3 6 6 6 8 10
Therefore: The median score is 6.
b. 12 14 15 16 16 16 17 18 20 21
Therefore: The median score is 16
Def. 5: The mode ( Mo ) is defined as the score which has the highest
frequency or the number that occurs most often in a set of data.

Note:
 A distribution can have more than one mode.
 If all the scores or values appear the same number of times, there
exists no mode.

COMPUTATION OF THE MODE FOR UNGROUPED DATA

For ungrouped data, the mode can be find by inspection.


Illustrative Examples:
1. Consider the following number of children per family among a sample of 10
families: Determine the mode.
3 5 6 5 5 4 5 5 1 5
Therefore : The Mode is 5

2. Find the mode of the following set of data:


a. 23 27 20 36 34 31
Ans. No Mode
b. 10 11 9 9 7 9 5
Ans. The Mode is 9
c. A set of five 3’s, ten 2’s, seven 5’s, and ten 4’s.
Ans. The mode is 2 & 4 so bimodal distribution
Measures of Relative Standing
• Point measures are those that divide a class frequency distribution of a
variable into a number of equal parts.
• Types:
1. Quartile – divides the class frequency distribution of a variable
theoretically into four equal parts.
Measures of Relative Standing

Characteristics and Uses:


The first quartile Q1, indicates that ¼ of the group lie below it and ¾ of the
group lie above it. On the other hand, the third quartile indicates that ¾ of
the group lie below it and ¼ of the group lie above it. The second quartile,
Q2, which is equal to the median, of course it indicates that ½ of the group
lie below it and other half of the group lie above it.
Measures of Relative Standing

2. Decile – Is a point measure that divides the class


frequency distribution of a variable into ten equal parts.
Hence decile means one-tenth.
Measures of Relative Standing

• Characteristics and Uses:


• Decile divides a distribution into 10 equal parts. Hence if a
group is to be divided into 10 subgroups according to some
trait such as ability, the decile may be used.
• decile is seldom used in statistical work.
 k ( n) 
 100  Cf B 
Pi  LPC  i  
 f PC 
 
Measures of Relative Standing

3. Percentile – divides the distribution into 100 equal parts


Measures of Relative Standing

• Characteristics and Uses:


• The percentile is used to indicate level of intelligence in
comparison with others. If a score of 55 is equivalent to P65, the
individual with a score of 55 is theoretically more intelligent
than 65% of those who took the test and 35% are more
intelligent than he.
• Used as a ranking or comparison standard.
• Used in transmuting raw scores into school marks
Measures of Relative Standing

Problem: Given the series: A) 3, 5, 2, 7, 6, 4, 9. B) 3, 5, 2, 7, 6, 4, 9, 1.


• Calculate the following: 1) Q1, 2) Q3 , 3) D2, 4) D7, 5) P 32, 6) P85
• Soln A: 3, 5, 2, 7, 6, 4, 9 Ascending Arrangement: 2 3 4 5 6 7 9
Formula for the ith position ([k(n+1)/4]th k=1,2,3,4)
• Q 1 = 1*(8/4) = 2, so 2nd position, thus the Value for Q1 = 3
• Therefore: 25% of the data set are below a score of 3.

• Q3 = 3*(8/4) = 6, so 6th position, thus the value for Q3 = 7


• Therefore : 75% of the data set are below a score of 7.
Measures of Relative Standing
• Ascending Arrangement: 2 3 4 5 6 7 9
• D2 = 2*(8/10) = 1.6th position so D2 = 2+0.6(3-2) = 2.6
• Therefore : 20% of the data set lies below a score of 2.6
• D7 = 7*(8/10) = 5.6th position so D7 = 6+0.6(7-6) = 6.6
• Therefore : 70% of the data set lies below a score of 6.6
• P32 = 32(8/100) = 2.56th position so P32= 3+0.65(4-3) =3.65
• Therefore : 32% of the data set lies below a score of 3.65
• P85 = 85(8/100) = 6.8th position so P85 = 7+0.8(9-7)= 8.6
• Therefore : 85% of the data set lies below a score of 8.6
Measures of Relative Standing
• Solution
• Ex-B: 3, 5, 2, 7, 6, 4, 9, 1. Arrange so: 1 2 3 4 5 6 7 9
• Formula for the ith position ([k(n+1)/4]th k=1,2,3,4)

• Q 1 = 1*(9/4) = 2.25th position, so Q1 = 2+0.25(3-2)= 2.25


• Therefore: 25% of the data set are below a score of 2.25.

• Q3 = 3*(9/4) = 6.75th position, so Q3 = 6+0.75(7-6) = 6.75


• Therefore : 75% of the data set are below a score of 6.75.
Measures of Relative Standing
• Sol’n B cont’d : 3, 5, 2, 7, 6, 4, 9, 1. Arrange so: 1 2 3 4 5 6 7 9
• Formula for the ith position ([k(n+1)/4]th where: k=1,2,3,4)
• D2 = 2*(9/10) = 1.8th position so D2 = 1+0.8(2-1) = 1.8
• Therefore : 20% of the data set lies below a score of 1.8
• D7 = 7*(9/10) = 6.3th position so D7 = 6+0.3(7-6) = 6.3
• Therefore : 70% of the data set lies below a score of 6.3
• P32 = 32(9/100) = 2.88th position so P32= 2+0.88(3-2) =2.88
• Therefore : 32% of the data set lies below a score of 2.88
• P85 = 85(9/100) = 7.65th position so P85 = 7+0.65(9-7)= 7.65
• Therefore : 85% of the data set lies below a score of 7.65
Measures of Relative Standing
• Finding the Percentile rank:
• The percentile corresponding to a given value (x) is computed by using
the formula:
𝒏𝒐.𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔 𝒃𝒆𝒍𝒐𝒘 𝒙+𝟎.𝟓
• 𝑷𝒆𝒓𝒄𝒆𝒏𝒕𝒊𝒍𝒆 = 𝒕𝒐𝒕𝒂𝒍 𝒏𝒐.𝒐𝒇 𝒗𝒂𝒍𝒖𝒆𝒔
𝑿 𝟏𝟎𝟎%
• Ex. 1.A teacher gives a 20-point test to 10 students. a) Find the
percentile rank of a score of 12. Given Scores: 18, 15, 12, 6, 8, 2, 3, 5,
20, 10.
• Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20.
• Percentile = [(6 + 0.5)/10](100%) = 65%
• Interpretation: A student with a score of 12 did better than 65% of the
class.
• Ex. 2.Find the value of the 75th percentile for the following scores in a test: 60,
55, 30, 80, 45, 57, 68, 81, 72, 37, 44, 32, 39, 41, 56.
• The ordered data set is: 30, 32, 37, 39, 41, 44, 45, 55, 56, 57, 60, 68, 72, 80, 81
• Since n = 15 and k = 75%, we have,
• P75 = k(n+1)/100
• P75 = (75 x 16)/100 = 12, hence,
• The 75th percentile value is the 12th position in the ordered set starting from the
lowest data; this score is 68.
• Interpretation: If a student scored 68 in the test, he would have done better than
75% of the class.
Measures of Variability
• What is good measure of variability?
Measures of Variability
• The measures of variability or dispersion are quantities that measure
the spread or variability of the observations or measurements in a
data set.

• Introduction
• Measures of average such as the median and mean represent the
typical value for a dataset. Within the dataset the actual values usually
differ from one another and from the average value itself. The extent
to which the median and mean are good representatives of the values
in the original dataset depends upon the variability or dispersion in
the original data. Datasets are said to have high dispersion when they
contain values considerably higher and lower than the mean value.
• In figure 1 the number of different sized tutorial groups in semester 1
and semester 2 are presented. In both semesters the mean and median
tutorial group size is 5 students, however the groups in semester 2 show
more dispersion (or variability in size) than those in semester 1.
• Dispersion within a dataset can be measured or described in several
ways including the range, inter-quartile range and standard deviation.
Measure of Variability
• Range - The range is the difference between the highest value and the lowest value
in the ungrouped data set. In the grouped data set, the range is the difference
between the upper limit of the highest interval and the lower limit of the lowest class
interval.
• For ungrouped data, R = HV – LV
• Properties of the Range
• It is quick to find but gives only a rough measure of dispersion.
• The larger the value of the range, the more dispersed the observations.
• It considers only the lowest and highest values in the population.
Measure of Variability
• In figure 1, the size of the largest semester 1 tutorial group is 6
students and the size of the smallest group is 4 students, resulting
in a range of 2 (6-4). In semester 2, the largest tutorial group size
is 7 students and the smallest tutorial group contains 3 students,
therefore the range is 4 (7-3).
• The range is simple to compute and is useful when you wish to
evaluate the whole of a dataset.
• The range is useful for showing the spread within a dataset and
for comparing the spread between similar datasets.
• An example of the use of the range to compare spread within datasets is
provided in table 1. The scores of individual students in the examination
and coursework component of a module are shown.

• To find the range in marks the highest and lowest values need to be found
from the table. The highest coursework mark was 48 and the lowest was
27 giving a range of 21. In the examination, the highest mark was 45 and
the lowest 12 producing a range of 33. This indicates that there was wider
variation in the students’ performance in the examination than in the
coursework for this module.
Range
• Since the range is based solely on the two most extreme values
within the dataset, if one of these is either exceptionally high or
low (sometimes referred to as outlier) it will result in a range that
is not typical of the variability within the dataset. For example,
imagine in the above example that one student failed to hand in
any coursework and was awarded a mark of zero, however they
sat the exam and scored 40. The range for the coursework marks
would now become 48 (48-0), rather than 21, however the new
range is not typical of the dataset as a whole and is distorted by
the outlier in the coursework marks. In order to reduce the
problems caused by outliers in a dataset, the inter-quartile range
is often calculated instead of the range.
• Interquartile Range - The IQR is the amount of spread between the first quartile and
the median or the median and the third quartile. In effect, it is showing the range for
the middle 50% of the data, as such, is not affected by the extreme values in the data
set. Formula IQR = Q3-Q1
• Properties of the Inter Quartile Range
• It measures the dispersion in the middle half of the items arranged in an array
• The inter-quartile range is a measure that indicates the extent to which the central
50% of values within the dataset are dispersed. It is based upon, and related to, the
median.
• In the same way that the median divides a dataset into two halves, it can be further
divided into quarters by identifying the upper and lower quartiles. The lower quartile
is found one quarter of the way along a dataset when the values have been arranged
in order of magnitude; the upper quartile is found three quarters along the dataset.
Therefore, the upper quartile lies half way between the median and the highest
value in the dataset whilst the lower quartile lies halfway between the median and
the lowest value in the dataset. The inter-quartile range is found by subtracting the
lower quartile from the upper quartile.
• For example, the examination marks for 20 students following a particular
module are arranged in order of magnitude.

• The median lies at the mid-point between the two central values (10th and 11th)
• = half-way between 60 and 62 = 61
• The lower quartile lies at the mid-point between the 5th and 6th values
• = half-way between 52 and 53 = 52.5
• The upper quartile lies at the mid-point between the 15th and 16th values
• = half-way between 70 and 71 = 70.5
• The inter-quartile range for this dataset is therefore 70.5 - 52.5 = 18 whereas the
range is:
• 80 - 43 = 37.
IQR
• The inter-quartile range provides a clearer picture of the overall dataset
by removing/ignoring the outlying values.
• Like the range however, the inter-quartile range is a measure of
dispersion that is based upon only two values from the dataset.
Statistically, the standard deviation is a more powerful measure of
dispersion because it takes into account every value in the dataset. The
standard deviation is explored in the next section of this guide.
c. The Variance - The variance, denoted by σ2, is the mean of the squared deviations of
the observations from their arithmetic mean.
Population Variance Sample Variance

2 2
𝑥 (𝑥 − 𝑥)
Ungrouped Data 𝜎2 = − µ2 𝑠2 =
𝑁 𝑛−1

• Properties of the Variance


• It is always non-negative.
• It is easy to manipulate for further mathematical treatment.
• It makes use of all observations.
• Its unit of measure is the square of the unit of measure of the given set of values.
`
The Standard Deviation - The standard deviation, denoted by
σ, is the positive root of the Variance.

• The standard deviation measures how concentrated the data are around
the mean; the more concentrated, the smaller the standard deviation.
• A small standard deviation can be a goal in certain situations where the
results are restricted, for example, in product manufacturing and quality
control. A particular type of car part that has to be 2 centimeters in
diameter to fit properly had better not have a very big standard
deviation during the manufacturing process. A big standard deviation in
this case would mean that lots of parts end up in the trash because they
don’t fit right; either that or the cars will have problems down the road.
Variability
• But in situations where you just observe and record data, a large
standard deviation isn’t necessarily a bad thing; it just reflects a large
amount of variation in the group that is being studied. For example, if
you look at salaries for everyone in a certain company, including
everyone from the student intern to the CEO, the standard deviation
may be very large. On the other hand, if you narrow the group down by
looking only at the student interns, the standard deviation is smaller,
because the individuals within this group have salaries that are less
variable. The second data set isn’t better, it’s just less variable.
• Here are some properties that can help you when interpreting a
standard deviation:
• The standard deviation can never be a negative number, due to the way
it’s calculated and the fact that it measures a distance (distances are
never negative numbers).
• The smallest possible value for the standard deviation is 0, and that
happens only in contrived situations where every single number in the
data set is exactly the same (no deviation).
• The standard deviation is affected by outliers (extremely low or
extremely high numbers in the data set). That’s because the standard
deviation is based on the distance from the mean. And remember, the
mean is also affected by outliers.
• The standard deviation has the same units as the original data.
• Properties of the Standard Deviation
• It is always non-negative
• It is easy to manipulate for further mathematical treatment.
• It makes use of all observations.
• Its unit of measure is the same unit of measure of the given set of values.
• Population SD 𝝈 = 𝝈𝟐 𝒂𝒏𝒅 𝑺𝒂𝒎𝒑𝒍𝒆 𝑺𝑫 𝒔 = 𝒔𝟐

E. Coefficient of Variation - The coefficient of variation, denoted by CV, is the


ratio of the standard deviation and the mean, which is expressed in percent. In
symbols,
𝜎 𝑠
• 𝐶𝑉 = 𝜇
∗ 100% 𝑓𝑜𝑟 𝑃𝑜𝑝 𝑎𝑛𝑑 𝑆𝑎𝑚𝑝𝑙𝑒 𝐶𝑉 = 𝑥
∗ 100%

• Properties of the Coefficient of Variation


• It is a unitless quantity.
• It can be used to compare the dispersion of two or more populations measured in
the same or different units.
• Thanks for Listening

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy