MATH2016 - 2021 - S2 Notes Week 1
MATH2016 - 2021 - S2 Notes Week 1
Week 1 & 2
Statistics is a collection of methods for collecting, analyzing, presenting and interpreting data and for
making decisions.
Definition An element or member of a sample is a specific subject or object about which data is
collected.
Example If we are interested in the set of all cars in a city, this set will be the population. To obtain
information about this population, we may select some cars from the population and study them. This
subset of cars would be the sample.
Definition Descriptive statistics consists of methods for organizing, displaying and describing data by
using tables, graphs and summary measures.
Definition Inferential statistics consists of methods that use sample results to help make decisions or
predictions about a population.
Example in the previous example, we may obtain the ages of cars in the sample to get an idea of the
ages of cars in the population.
Definition A variable is a characteristics under study that assumes different values for different
elements.
Defintion A qualitative variable cannot assume a numerical value, but can be classified in two or more
nonnumeric categories.
1
Definition A variable whose values are countable is called a discrete variable.
Definition A variable that can assume any numerical value over a certain interval or intervals is called a
continuous variable.
Frequency Distribution
Example In a test, students obtained the following marks out of a maximum of 10: 5, 6, 9, 3, 8, 8, 9, 10,
6, 8, 5, 8, 9, 3, 3, 3, 3, 8, 8, 3
Frequency distribution
Mark f
3 6
4 0
5 2
6 2
7 0
8 6
9 3
10 1
Note For a small number of discrete values, a frequency distribution like the above one is suitable.
Histogram A histogram is a graph with the classes on the horizontal axis and the frequencies (or relative
frequencies or percentages) on the vertical axis.
Example Students wrote a test, for which the maximum possible mark was 25. The marks obtained by
the students were 5,6,6,8,11, 11, 11, 12, 12, 13, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18,
18, 19, 19, 22, 22, 22, 23, 24
2
Classes Class Boundaries Frequency Relative Frequency
5–9 4.5 to less than 9.5 4
10 – 14 9.5 to less than 14.5 10
15 – 19 14.5 to less than 19.5 14
20 - 24 19.5 to less than 24.5 5
14
10
5
4
Definiition A frequency polygon is a graph formed by joining midpoints of the tops of successive bars in
a histogram with straight lines.
Using the above histogram, we can insert the frequency polygon as shown below
3
14
10
5
4
2 7 12 17 22 27
Cumulative Frequency
A cumulative frequency distribution gives the total number of values that fall below the upper
Example From previous example, we obtain the following cumulative frequency distribution.
4
Definition An ogive is a graph drawn by joining with straight lines the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective class.
33
28
14
Example Students got the following marks in a test: 75, 52, 80, 96, 71, 53, 78, 81, 75, 59, 57, 52
5
5 2 3 9 7 2
6
7 5 1 8 5
8 0 1
9 6
Bar Graphs
Example 30 employees of a company were asked how stressful their job was and the following
frequency distribution drawn up to illustrate their responses:
Bar chart
14
10
The bars are of the same width and with equal spacing.
6
Definition The mode of a list of numbers is the one that occurs most often. There may be more than
one mode if more than one value occurs the maximum number of time.
Definition The median of an odd number of values is the one in the middle when the numbers are
written in ascending order. The median of an even number of values is the average of the two in the
middle when the numbers are written in ascending order.
Example 3, 4, 9, 9, 9, 10
99
median = 9
2
x i
2
2 i 1
N
where represents the mean of the population.
Sample Variance Given a sample x1 , x 2 ,..., x n from some population, the sample variance is
x i x
2
s2 i 1
n 1
where x is the sample mean.
standard deviation for sample = s = square root of variance for sample
http://www.uvm.edu/~dhowell/SeeingStatisticsApplets/N-1.html
Shortcut Formulae for population variance and sample variance for ungrouped data
x 2
x 2
N
2
N
x 2
x 2
n
s
2
n 1
Example Consider the sample 82, 95, 67, 92.
x = 84
7
x x- x
82 82-84=-2
95 95-84=11
67 67-84=-17
92 92-84=8
Example
The following table gives the daily commuting times in minutes from home to work for all 25 employees
of a company.
mf
535
N 25
8
f m
2
2
N
f m x
2
s2
n 1
Shortcut Formulae
mf 2
m 2
f
N
2
mf 2
m 2
f
n
s
2
n 1
N = 25 mf 535 m 2 f = 14,825
mf 2
(535) 2
m 2
f
N 14,825
2 25 3376 135.04
N 25 25
standard deviation = 135,04 11 .62
Quartiles
Quartiles are three summary measures that divide a ranked data set into four equal parts. The second
quartile is the same as the median of a data set. The first quartile is the value of the middle term among
9
the observations that are less than the median, and the third quartile is the value of the middle term
among the observations that are greater than the median.
First quartile = Q1
Second quartile = Q2
First quartile = Q3
Interquartile range = Q3 Q1
Example Consider the values 2, 4, 5, 6, 8, 10, 14
Second quartile = 6
First quartile = 4
Third quartile = 10
Second quartile = 8
First quartile = (4+5)/2 = 4.5
Third quartile = (14+15)/2 = 14.5
First quartile = Q1 = 37
Second quartile = Q2 = 47
Third quartile = Q3 = 61
10
Interquartile range = IQR = Q3 Q1 24
Upper inner fence = Q3 + 1.5(IQR) = 61+36
Lower inner fence = Q1 - 1.5(IQR)= 37-36
Smallest value within the two inner fences = 29
Largest value within the two inner fences = 72
A mild outlier is outside either of the two inner fences but within either of the two outer fences.
A extreme outlier is outside either of the two outer fences.
25 35 45 55 65 75 85 105
11