0% found this document useful (0 votes)
11 views

MATH2016 - 2021 - S2 Notes Week 1

This document provides definitions and examples of key concepts in statistics including populations, samples, variables, descriptive and inferential statistics, frequency distributions, histograms, cumulative frequencies, measures of central tendency, and variance. Formulas for calculating variances and means for both populations and samples are also presented.

Uploaded by

Forza Bee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

MATH2016 - 2021 - S2 Notes Week 1

This document provides definitions and examples of key concepts in statistics including populations, samples, variables, descriptive and inferential statistics, frequency distributions, histograms, cumulative frequencies, measures of central tendency, and variance. Formulas for calculating variances and means for both populations and samples are also presented.

Uploaded by

Forza Bee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

MATH2026

Week 1 & 2

Statistics is a collection of methods for collecting, analyzing, presenting and interpreting data and for
making decisions.

Definition A population consists of all elements whose characteristics are of interest.

Definition A sample is a portion of the population selected for study.

Definition An element or member of a sample is a specific subject or object about which data is
collected.

Example If we are interested in the set of all cars in a city, this set will be the population. To obtain
information about this population, we may select some cars from the population and study them. This
subset of cars would be the sample.

Definition Descriptive statistics consists of methods for organizing, displaying and describing data by
using tables, graphs and summary measures.

Definition Inferential statistics consists of methods that use sample results to help make decisions or
predictions about a population.

Example in the previous example, we may obtain the ages of cars in the sample to get an idea of the
ages of cars in the population.

Definition A variable is a characteristics under study that assumes different values for different
elements.

Example Age of a car in the first example.

Definition The value of a variable for an element is called an observation or measurement.

Definiton A data set is a collection of observations on one or more variables.

Definition A quantitative variable is one that can be measured numerically.

Example The weight of a bolt.

Defintion A qualitative variable cannot assume a numerical value, but can be classified in two or more
nonnumeric categories.

Example The colour of an object.

1
Definition A variable whose values are countable is called a discrete variable.

Example The number of students in a class.

Definition A variable that can assume any numerical value over a certain interval or intervals is called a
continuous variable.

Example The mass of a tyre.

Frequency Distribution

Example In a test, students obtained the following marks out of a maximum of 10: 5, 6, 9, 3, 8, 8, 9, 10,
6, 8, 5, 8, 9, 3, 3, 3, 3, 8, 8, 3

Frequency distribution

Mark f

3 6
4 0
5 2
6 2
7 0
8 6
9 3
10 1

Note The frequency of 3 is 6, the frequency of 4 is 0 and so on.

Note For a small number of discrete values, a frequency distribution like the above one is suitable.

Frequency distribution for grouped data

Histogram A histogram is a graph with the classes on the horizontal axis and the frequencies (or relative
frequencies or percentages) on the vertical axis.

frequency of that class


Definition The relative frequency of a class =
sum of all frequencies

Percentage = (relative frequency)  100%

Example Students wrote a test, for which the maximum possible mark was 25. The marks obtained by
the students were 5,6,6,8,11, 11, 11, 12, 12, 13, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18,
18, 19, 19, 22, 22, 22, 23, 24

2
Classes Class Boundaries Frequency Relative Frequency
5–9 4.5 to less than 9.5 4
10 – 14 9.5 to less than 14.5 10
15 – 19 14.5 to less than 19.5 14
20 - 24 19.5 to less than 24.5 5

The histogram corresponding to the above table is drawn below

14

10

5
4

4.5 9.5 14.5 19.5 24.5

Definiition A frequency polygon is a graph formed by joining midpoints of the tops of successive bars in
a histogram with straight lines.

Using the above histogram, we can insert the frequency polygon as shown below

3
14

10

5
4

4.5 9.5 14.5 19.5 24.5

2 7 12 17 22 27

Cumulative Frequency

A cumulative frequency distribution gives the total number of values that fall below the upper

boundary of each class.

Example From previous example, we obtain the following cumulative frequency distribution.

Class Class Boundaries Cumulative Frequency

5–9 4.5 to less than 9.5 4

10 – 14 4.5 to less than 14.5 14

15 – 19 4.5 to less than 19.5 28

20 – 24 4.5 to less than 24.5 33

4
Definition An ogive is a graph drawn by joining with straight lines the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective class.

Example An ogive corresponding to the example above is drawn below.

33

28

14

4.5 9.5 14.5 19.5 24.5

Stem and Leaf Display

Example Students got the following marks in a test: 75, 52, 80, 96, 71, 53, 78, 81, 75, 59, 57, 52

The stem and leaf display for this data is

5
5 2 3 9 7 2
6
7 5 1 8 5
8 0 1
9 6

Bar Graphs
Example 30 employees of a company were asked how stressful their job was and the following
frequency distribution drawn up to illustrate their responses:

Stress on Job Frequency


Very stressful 10
Somewhat stressful 14
not stressful 6

Bar chart

14

10

very somewhat none

“very” means very stressful


“somewhat” means somewhat stressful
“none” means not stressful

The bars are of the same width and with equal spacing.

Definition The mean of a list of numbers is the arithmetic average.

6
Definition The mode of a list of numbers is the one that occurs most often. There may be more than
one mode if more than one value occurs the maximum number of time.

Definition The median of an odd number of values is the one in the middle when the numbers are
written in ascending order. The median of an even number of values is the average of the two in the
middle when the numbers are written in ascending order.

Example 3, 4, 9, 9, 9, 10

99
median = 9
2

Variance of Population Given a population  x1 , x 2 ,..., x N  , the population variance is

 x i  
2

2  i 1

N

where represents the mean of the population.

Standard deviation for population =  = square root of variance for population

Sample Variance Given a sample  x1 , x 2 ,..., x n  from some population, the sample variance is

 x i  x
2

s2  i 1

n 1
where x is the sample mean.
standard deviation for sample = s = square root of variance for sample
http://www.uvm.edu/~dhowell/SeeingStatisticsApplets/N-1.html

Shortcut Formulae for population variance and sample variance for ungrouped data
  x 2

x 2

N
 
2

N
 x 2

x 2

n
s 
2

n 1
Example Consider the sample 82, 95, 67, 92.
x = 84

7
x x- x
82 82-84=-2
95 95-84=11
67 67-84=-17
92 92-84=8

(2) 2  (11) 2  ( 17) 2  (8) 2


Variance = s 2   159.33
4 1
The standard deviation for a population is the square root of the population variance, and the standard
deviation for a sample is the square root of the sample variance.

Mean for Grouped Data


mf mf
Mean for population data:  
N Mean for sample data: x 
 n
where m is the midpoint and f is the frequency of the class.

Example
The following table gives the daily commuting times in minutes from home to work for all 25 employees
of a company.

Daily commuting time Number of employees


0 to less than 10 4
10 to less than 20 9
20 to less than 30 6
30 to less than 40 4
40 to less than 50 2

Daily commuting time f m mf


0 to less than 10 4 5 20
10 to less than 20 9 15 135
20 to less than 30 6 25 150
30 to less than 40 4 35 140
40 to less than 50 2 45 90
N = 25  mf  535


 mf 
535
N 25

Variance and Standard Deviation for Grouped Data

8
 f m  
2

2 
N

 f m  x
2

s2 
n 1

Shortcut Formulae

  mf  2

m 2
f 
N
  2

  mf  2

m 2
f 
n
s 
2

n 1

Daily commuting time f m mf m2 f

0 to less than 10 4 5 20 100


10 to less than 20 9 15 135 2025
20 to less than 30 6 25 150 3750
30 to less than 40 4 35 140 4900
40 to less than 50 2 45 90 4050

N = 25  mf  535  m 2 f = 14,825

  mf  2
(535) 2
m 2
f 
N 14,825 
2   25  3376  135.04
N 25 25
standard deviation =   135,04  11 .62

Quartiles

Quartiles are three summary measures that divide a ranked data set into four equal parts. The second
quartile is the same as the median of a data set. The first quartile is the value of the middle term among
9
the observations that are less than the median, and the third quartile is the value of the middle term
among the observations that are greater than the median.
First quartile = Q1

Second quartile = Q2

First quartile = Q3

Interquartile range = Q3  Q1
Example Consider the values 2, 4, 5, 6, 8, 10, 14

Second quartile = 6
First quartile = 4
Third quartile = 10

Example Consider the values 2, 4, 5, 6, 8, 10, 14, 15

Second quartile = (6+8)/2 = 7


First quartile = (4+5)/2 = 4.5
Third quartile = (10+14)/2 = 12

Example Consider the values 2, 4, 5, 6, 8, 10, 14, 15, 17

Second quartile = 8
First quartile = (4+5)/2 = 4.5
Third quartile = (14+15)/2 = 14.5

Box and Whisker Plot


This is a plot that shows the centre, spread and skewness of a data set. It is constructed by drawing a
box and two whiskers that use the median, the first quartile, the third quartile and the smallest and the
largest values in the data set between the lower and upper inner fences.
Example Consider the sample {35, 29, 44, 72, 34, 64, 41, 50, 54, 104, 39, 58}
We are going to draw a box plot for this sample.

First quartile = Q1 = 37

Second quartile = Q2 = 47

Third quartile = Q3 = 61

10
Interquartile range = IQR = Q3  Q1  24
Upper inner fence = Q3 + 1.5(IQR) = 61+36
Lower inner fence = Q1 - 1.5(IQR)= 37-36
Smallest value within the two inner fences = 29
Largest value within the two inner fences = 72

Upper outer fence = Q3 + 3.0(IQR)=133


Lower outer fence = Q1 - 3.0(IQR)= -35

A mild outlier is outside either of the two inner fences but within either of the two outer fences.
A extreme outlier is outside either of the two outer fences.

104 is a mild outlier in this example. It is represented by the asterisk.

smallest value within largest value within


the two inner fences median the two inner fences

first quartile third quartile an outlier

25 35 45 55 65 75 85 105

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy