Chapter 1 SSTS001 2024
Chapter 1 SSTS001 2024
INTRODUCTION
AND DEFINITIONS
SSTS001 2024
CONTENTS
1.1 What is statistics
1.2 The language of statistics
– Random variable and its data
– Population and its measure
– Sample and its measure
1.3 Components of Statistics
– Descriptive statistics
– Inferential statistics
– Statistical modelling
1.4 Computer and Software for statistical analysis
SSTS001 2024
1.1. What is statistics
• Can therefore be defined as a set of
mathematically-based methods and
techniques which transform sets of raw
(unprocessed) data into a few meaningful
summary measures, relationship,
patterns and trends, which then convey
useful and usable information to support
decision making.
SSTS001 2024
Example 1
Raw data: Colour preference of SSTS000 students
SSTS001 2024
Frequency Table
Black 18
Blue 11
Green 6
Yellow 5
Total 40
SSTS001 2024
Colour preference
20
18
18
16
14
12
10
11
8
6
4
5
2
SSTS001 2024
COLOUR PREFERENCE
Yellow
13%
Green
15% Black
45%
Blue
27%
SSTS001 2024
Can the above information help you to make
a decision?
SSTS001 2024
Example 2
Raw data Results
Column1
Ages of people in a village Mean 28
Standard Error 7,44241
2 13 87 6 Median 11,5
Mode 5
Standard Deviation 33,28347
3 12 14 7 Sample Variance 1107,789
Kurtosis -0,58675
79 40 69 2 Skewness 1,105807
Range 88
Minimum 2
85 5 11 5 Maximum 90
Sum 560
Count 20
90 20 5 5
SSTS001 2024
Statistical analysis in decision
making
Managemen
Statistical
Data Information t Decision
Analysis
Making
[statistical
summary
[Raw [Transformation measures,
values] process] relationships
, patterns,
trends
SSTS001 2024
1.2 The language of statistics
• A Random Variable
Is any attribute or characteristic being
measured or observed
Examples
✓ Ages of employees in a company
✓Students' hair colours at UL
✓Distance travelled per day by delivery
vehicles
✓Country of origin of foreign students in UL
SSTS001 2024
Cont.…
• Data- are actual values (numbers) or outcome
recorded on a random variable.
Examples
✓ Ages of employees in a company
❖ years{25, 30, 45, 65, 20, 33}
✓ A person’s hair colour
❖ Colour {blond, brunette, black, etc.}
✓ Distance travelled per day by delivery vehicles
❖ KM {22, 7, 45, 6, 26}
✓ Country of origin of foreign students in UL
❖ Country { USA, Zimbabwe, Malawi, etc.}
SSTS001 2024
Population vs. Sample
Population Sample
SSTS001 2024
Cont.
• A population (or universe)
-is a set of all items of interest in a statistical problem.
Or
-Represents every possible item that contains a data
value (measurement or observation) of the random
variable under study.
Examples
✓ All Ages of employees in a company
✓ All Students' hair colours at UL
SSTS001 2024
Cont.
• Population parameter- is the actual value
of a random variable in a population.
Examples
✓the actual average age of all employees in
a company
✓Actual average distance travelled per day
by delivery vehicles
SSTS001 2024
Results
Column1
Ages of all people in a village Mean 28
Standard Error 7,44241
2 13 87 6 Median 11,5
Mode 5
Standard Deviation 33,28347
3 12 14 7 Sample Variance 1107,789
Kurtosis -0,58675
79 40 69 2 Skewness 1,105807
Range 88
Minimum 2
85 5 11 5 Maximum 90
Sum 560
Count 20
90 20 5 5
SSTS001 2024
Cont.
• Sample
is a set of data drawn from the population
or is a subset of items drawn from a
population. A descriptive measure of a
sample is called a statistic
Example
SSTS001 2024
Cont.
• Sample statistics-is a value of a random
variable derived from sample data
SSTS001 2024
Results
• Descriptive statistics
✓Condenses large volumes of data into a
few summary measures.
SSTS001 2024
• Inferential statistics
✓Generates sample findings to a broader
population.
• Statistical modelling
✓ Builds relationships between variables to
make predictions.
SSTS001 2024
1.4 Computer and Software for statistical
analysis
SSTS001 2024
DATA
2.1 Data types
2.2 Data sources
2.3 Data collection methods
SSTS001 2024
Data types
SSTS001 2024
Data- is raw material of statistical
analysis
SSTS001 2024
Data types
Data can be classified in three ways
• Classification 1: categorical versus numeric (or
qualitative versus quantitative)
SSTS001 2024
Classification 1
SSTS001 2024
Classification 1
SSTS001 2024
EXAMPLES OF A CATEGORICAL
Random variable
DATA
Categories Codes
Gender Female 1
Male 2
Country of origin South Africa 1
Botswana 2
Namibia 3
SSTS001 2024
EXAMPLES OF NUMERIC DATA
Random variable Numeric Numeric data
response range value
Employee years 1-20 years 18 years
of service
SSTS001 2024
Classification 2
SSTS001 2024
Classification 2
Nominal
Is a sub classification of categorical data. It
is for categories of equal importance.
Examples
Random variable Nominal response
SSTS001 2024
Classification 2 cont.
Ordinal
Is a sub classification of categorical data. There is
an implied ranking between categories.
Examples
Random variable ordinal categories Ranked codes
1 2 3 4
SSTS001 2024
Classification 2 cont.
Ratio-scaled data
Is pure numeric data. Numeric data values
are derived from direct measurement where
there is an absolute origin of zero.
Examples
Random variable Illustrative numeric data
values
Price of milk per litre R9.50
Number of children in a 4 children
household SSTS001 2024
Classification 3
SSTS001 2024
Classification 3
Discrete data- consist of whole numbers
only.
Examples
Random variable Discrete values
Discrete numeric
Numeric sales transactions per 43, 86, 118, 211
day
Number of children in a family 0,1,2,3,4,5
Discrete categorical
Gender 1=male 2=female
Do you exercise regularly 1=no 2=yes
SSTS001 2024
Classification 3 Cont.
Continuous data
Is numeric data that can validly take on any
value in an interval.
Examples
Random variable Valid interval Data values
Univariate data
When we conduct a study that looks at only
one variable
Bivariate data
When we conduct a study that examines the
relationship between two variables
SSTS001 2024
Data sources
SSTS001 2024
Data sources
SSTS001 2024
• Internal data refers to the availability of
data from within an organisation.
SSTS001 2024
• Primary data is data which is captured at
the point where it is generated. Such data
is captured for the first time and with a
specific purpose in mind.
SSTS001 2024
Advantages of primary data
• Primary data are directly relevant to the
problem at hand
SSTS001 2024
Disadvantages of primary data
• Primary data could be time consuming to
collect
• They are generally more expensive to
collect
SSTS001 2024
Advantages of secondary data
• Secondary data already exists
• Access time is relatively short, especially if
the data is accessible through the internet.
• Secondary data is generally less
expensive to acquire than primary data.
SSTS001 2024
Disadvantages of secondary
data
• Secondary data may not be problem
specific (relevancy)
• Currency of data: data may be dated
(“old”) and hence inappropriate
• It may be difficult to assess data accuracy
• Secondary data may not be subject to
further manipulation, or not be at the right
level of aggregation
SSTS001 2024
Data collection
methods
SSTS001 2024
• Through observation
• By conducting surveys
❑Personal interviews
❑Postal surveys
❑Telephone interview
❑E-surveys
• Through experiments
SSTS001 2024
Questions?
SSTS001 2024
SSTS001 2024
Chapter 3
SSTS001 2024
Sampling
Methods
SSTS001 2024
Key Definitions
• A population is the collection of all items of interest or
under investigation
• N represents the population size
• A sample is an observed subset of the population
• n represents the sample size
SSTS001 2024
Population vs. Sample
Population Sample
SSTS001 2024
Why Sample?
SSTS001 2024
Methods of Sampling
SSTS001 2024
Non-probability
sampling
methods
SSTS001 2024
There are four types of non-probability
sampling methods:
• Convenience sampling
• Judgement sampling
• Quota sampling
• Snowball sampling
SSTS001 2024
Convenience sampling
• When the sampling units are selected to suit
the convenience of the researcher,
convenience sampling has been applied.
Respondents are included in the sample if
they happen to be in the right place at the
right time. Choose individuals for a sample
because they are easy to include.
Example
• Internet Polls
• Mail-In Customer Survey
SSTS001 2024
Judgement sampling
When the researcher uses personal judgment
alone to select whom he or she consider to be
the most appropriate sampling units to include
in the sample in order to provide data to
address the question under study, then
judgment sampling have been applied.
Example
• only labour consultants instead of selecting
from employees in general
SSTS001 2024
Quota sampling
This selection is non-random (selection is bias)
Example
• For example, a researcher might ask for a
sample of 100 females, or 100 individuals
between the ages of 20-30.
• In a study wherein the researcher likes to
compare the academic performance of the
different high school class levels, its
relationship with gender and socioeconomic
status, the researcher first identifies the
subgroups
SSTS001 2024
Snowball sampling
Snowball sampling is used to reach target
populations where the sampling unit are difficult
to identify. Under snowball sampling, each
identified member of the target population is
asked to identify other sampling units who
belong to the same target population. The
issues under investigation are usually
confidential or sensitive in nature.
Example
• Identify Aids sufferers
SSTS001 2024
Probability
sampling
methods
SSTS001 2024
There are four probability based sampling
methods:
• Simple random sampling
• Systematic random sampling
• Stratified random sampling
• Cluster random sampling.
SSTS001 2024
Random Sampling
The sample is chosen as a result of chance
occurrences
Example
• Telephone polling random telephone
numbers
• Drawing names out of a hat
SSTS001 2024
SSTS001 2024
SSTS001 2024
Systematic Sampling
The population is placed on a list, a random starting point is
chosen and then every k-th member is selected.
Example
• Choosing a sample of registered voters by choosing
every 25th voter from the county registration roll
• Testing every 300th product from the assembly line
SSTS001 2024
SSTS001 2024
SSTS001 2024
Stratified Sampling
The population is divided into groups (strata) usually
with meaningful differences, and a sample is chosen
from each group.
Examples
• Choosing 200 men and 200 women for a sample
• Stratify the population by income level and then
choose a sample of low, middle, and high income
individuals
SSTS001 2024
SSTS001 2024
SSTS001 2024
SSTS001 2024
Cluster Sampling
The population is divided into groups in a more or less random way, and then a
sample is chosen by randomly selecting entire groups.
SSTS001 2024
SSTS001 2024
SSTS001 2024
The end of chapter 3
Thank you!!!!!
Questions???????????
SSTS001 2024