0% found this document useful (0 votes)
20 views83 pages

Chapter 1 SSTS001 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views83 pages

Chapter 1 SSTS001 2024

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Chapter 1

INTRODUCTION
AND DEFINITIONS
SSTS001 2024
CONTENTS
1.1 What is statistics
1.2 The language of statistics
– Random variable and its data
– Population and its measure
– Sample and its measure
1.3 Components of Statistics
– Descriptive statistics
– Inferential statistics
– Statistical modelling
1.4 Computer and Software for statistical analysis

SSTS001 2024
1.1. What is statistics
• Can therefore be defined as a set of
mathematically-based methods and
techniques which transform sets of raw
(unprocessed) data into a few meaningful
summary measures, relationship,
patterns and trends, which then convey
useful and usable information to support
decision making.

SSTS001 2024
Example 1
Raw data: Colour preference of SSTS000 students

blue blue yellow blue


black black black black
green blue blue yellow
yellow green black black
yellow black blue green
black black black blue
green blue black black
black black green black
yellow black black green
blue black blue blue

SSTS001 2024
Frequency Table

Colour Preference Number

Black 18

Blue 11

Green 6

Yellow 5

Total 40

SSTS001 2024
Colour preference
20

18

18
16

14

12

10
11
8

6
4
5
2

Black Blue Green Yellow

SSTS001 2024
COLOUR PREFERENCE
Yellow
13%

Green
15% Black
45%

Blue
27%

SSTS001 2024
Can the above information help you to make
a decision?

If you are to sell T-shirts, which colour do


you think can sell more?

SSTS001 2024
Example 2
Raw data Results
Column1
Ages of people in a village Mean 28
Standard Error 7,44241
2 13 87 6 Median 11,5
Mode 5
Standard Deviation 33,28347
3 12 14 7 Sample Variance 1107,789
Kurtosis -0,58675
79 40 69 2 Skewness 1,105807
Range 88
Minimum 2
85 5 11 5 Maximum 90
Sum 560
Count 20
90 20 5 5

SSTS001 2024
Statistical analysis in decision
making

Input Process Output Benefit

Managemen
Statistical
Data Information t Decision
Analysis
Making
[statistical
summary
[Raw [Transformation measures,
values] process] relationships
, patterns,
trends

SSTS001 2024
1.2 The language of statistics

• A Random Variable
Is any attribute or characteristic being
measured or observed
Examples
✓ Ages of employees in a company
✓Students' hair colours at UL
✓Distance travelled per day by delivery
vehicles
✓Country of origin of foreign students in UL
SSTS001 2024
Cont.…
• Data- are actual values (numbers) or outcome
recorded on a random variable.
Examples
✓ Ages of employees in a company
❖ years{25, 30, 45, 65, 20, 33}
✓ A person’s hair colour
❖ Colour {blond, brunette, black, etc.}
✓ Distance travelled per day by delivery vehicles
❖ KM {22, 7, 45, 6, 26}
✓ Country of origin of foreign students in UL
❖ Country { USA, Zimbabwe, Malawi, etc.}

SSTS001 2024
Population vs. Sample

Population Sample

SSTS001 2024
Cont.
• A population (or universe)
-is a set of all items of interest in a statistical problem.
Or
-Represents every possible item that contains a data
value (measurement or observation) of the random
variable under study.
Examples
✓ All Ages of employees in a company
✓ All Students' hair colours at UL
SSTS001 2024
Cont.
• Population parameter- is the actual value
of a random variable in a population.
Examples
✓the actual average age of all employees in
a company
✓Actual average distance travelled per day
by delivery vehicles

SSTS001 2024
Results
Column1
Ages of all people in a village Mean 28
Standard Error 7,44241
2 13 87 6 Median 11,5
Mode 5
Standard Deviation 33,28347
3 12 14 7 Sample Variance 1107,789
Kurtosis -0,58675
79 40 69 2 Skewness 1,105807
Range 88
Minimum 2
85 5 11 5 Maximum 90
Sum 560
Count 20
90 20 5 5

SSTS001 2024
Cont.
• Sample
is a set of data drawn from the population
or is a subset of items drawn from a
population. A descriptive measure of a
sample is called a statistic
Example

SSTS001 2024
Cont.
• Sample statistics-is a value of a random
variable derived from sample data

SSTS001 2024
Results

Ages of some of the people in a Column1


village
Mean 35,42857
2 87 6 Standard Error 10,00918
Median 9
Mode 5
7 Standard
Deviation 37,45092
79 40 69 Sample Variance 1402,571
Kurtosis -1,76981
Skewness 0,562183
85 5 11 5 Range 88
Minimum 2
90 5 5 Maximum 90
Sum 496
Count 14
SSTS001 2024
Symbolic notations for sample and
population measures
Statistical Sample Population
measure statistic parameter
Mean 𝑥ҧ 𝜇
Standard 𝑠 𝜎
deviation
Variance 𝑠2 𝜎2
Size 𝑛 𝑁
Proportion 𝑝 𝜋
Correlation 𝑟
SSTS001 2024
𝜌
1.3 Components of Statistics

Three components to statistics

• Descriptive statistics
✓Condenses large volumes of data into a
few summary measures.

SSTS001 2024
• Inferential statistics
✓Generates sample findings to a broader
population.

• Statistical modelling
✓ Builds relationships between variables to
make predictions.

SSTS001 2024
1.4 Computer and Software for statistical
analysis

• There are many widely-used software


packages for statistical analysis. Below we list
some of them.
• Minitab
• SAS
• SPSS
• Stata
• R
• Except for the above listed softwares it is possible
to make simple statistical analysis of data by using
the part “Data analysis” in Microsoft EXCEL.

SSTS001 2024
DATA
2.1 Data types
2.2 Data sources
2.3 Data collection methods

SSTS001 2024
Data types

SSTS001 2024
Data- is raw material of statistical
analysis

Data quality is influenced by three


important factors
• The type of data available for analysis
• The source of data which that data comes
• The data collection methods used to
record the data

SSTS001 2024
Data types
Data can be classified in three ways
• Classification 1: categorical versus numeric (or
qualitative versus quantitative)

• Classification 2: nominal, Ordinal, Interval and


ratio-scaled

• Classification 3: discrete versus continuous

SSTS001 2024
Classification 1

SSTS001 2024
Classification 1

Categorical data/ qualitative data


Refers to data representing categories of
outcomes of a random variable. Categorical
data are number like codes arbitrarily
assigned to different categorical labels. As a
general rule, if a response can only be
counted, the data type is categorical.

SSTS001 2024
EXAMPLES OF A CATEGORICAL
Random variable
DATA
Categories Codes
Gender Female 1
Male 2
Country of origin South Africa 1
Botswana 2
Namibia 3

Highest qualification School certificate 1


Diploma 2
Degree 3

Do you exercise Yes 1


regularly No 2
SSTS001 2024
Classification 1 Cont.
Numeric data
Are real numbers that can be manipulated
using arithmetic operation (i.e. addition,
subtraction, multiplication and division) to
produce meaningful results. As a general
rule, if the outcome of a random variable can
be “measured”, the data type is numeric

SSTS001 2024
EXAMPLES OF NUMERIC DATA
Random variable Numeric Numeric data
response range value
Employee years 1-20 years 18 years
of service

Hourly rate for R51-R60 R55


cleaners (R/hour)

Children under 10 2 children


years in a family

SSTS001 2024
Classification 2

SSTS001 2024
Classification 2
Nominal
Is a sub classification of categorical data. It
is for categories of equal importance.
Examples
Random variable Nominal response

Gender 1=female 2=male

Home language 1=Xhosa 2=Zulu 3=Afrikaans


4=Other

SSTS001 2024
Classification 2 cont.
Ordinal
Is a sub classification of categorical data. There is
an implied ranking between categories.
Examples
Random variable ordinal categories Ranked codes

company size Small 1


Medium 2
Large 3
Magazine True love 1 (top)
preference Move 2 (2nd)
Drum
SSTS001 2024 3 (3rd)
Classification 2 cont.
Interval
Is a sub classification of numeric data. It is
generated mainly from rating scales
Examples
“How satisfied were you with your matric results?”

very Dissatisfied Satisfied Very satisfied


Dissatisfied

1 2 3 4

SSTS001 2024
Classification 2 cont.
Ratio-scaled data
Is pure numeric data. Numeric data values
are derived from direct measurement where
there is an absolute origin of zero.
Examples
Random variable Illustrative numeric data
values
Price of milk per litre R9.50
Number of children in a 4 children
household SSTS001 2024
Classification 3

SSTS001 2024
Classification 3
Discrete data- consist of whole numbers
only.
Examples
Random variable Discrete values
Discrete numeric
Numeric sales transactions per 43, 86, 118, 211
day
Number of children in a family 0,1,2,3,4,5

Discrete categorical
Gender 1=male 2=female
Do you exercise regularly 1=no 2=yes

SSTS001 2024
Classification 3 Cont.
Continuous data
Is numeric data that can validly take on any
value in an interval.
Examples
Random variable Valid interval Data values

Age of employee 21-65years 27

Distance from home 1-30km 6.68km


to work
SSTS001 2024
Summary of
data types
SSTS001 2024
Classification 1
Classification 2 Classification 3
Categorical data
Nominal-scaled Discrete data
❖Number like data Consists whole
code-outcome❖Equal numbers only
Numeric data importance • Discrete
❖Real numbers Ordinal-scaled numeric
data • Discrete
❖Implied ranking categorical
Interval-scaled Continuous
data ❖Can take any
❖Rating scales value in an
Ratio-scaled data interval
❖PureSSTS001
numbers
2024
Univariate vs Bivariate data

Univariate data
When we conduct a study that looks at only
one variable

Bivariate data
When we conduct a study that examines the
relationship between two variables

SSTS001 2024
Data sources

SSTS001 2024
Data sources

• Internal versus external


sources
• Primary versus secondary
sources

SSTS001 2024
• Internal data refers to the availability of
data from within an organisation.

• External data sources refers to data


available from outside an organisation.

SSTS001 2024
• Primary data is data which is captured at
the point where it is generated. Such data
is captured for the first time and with a
specific purpose in mind.

• Secondary data sources is data collected


and processed by others for a purpose
other than the problem at hand

SSTS001 2024
Advantages of primary data
• Primary data are directly relevant to the
problem at hand

• They generally offer greater control over


data accuracy

SSTS001 2024
Disadvantages of primary data
• Primary data could be time consuming to
collect
• They are generally more expensive to
collect

SSTS001 2024
Advantages of secondary data
• Secondary data already exists
• Access time is relatively short, especially if
the data is accessible through the internet.
• Secondary data is generally less
expensive to acquire than primary data.

SSTS001 2024
Disadvantages of secondary
data
• Secondary data may not be problem
specific (relevancy)
• Currency of data: data may be dated
(“old”) and hence inappropriate
• It may be difficult to assess data accuracy
• Secondary data may not be subject to
further manipulation, or not be at the right
level of aggregation

SSTS001 2024
Data collection
methods
SSTS001 2024
• Through observation
• By conducting surveys
❑Personal interviews
❑Postal surveys
❑Telephone interview
❑E-surveys
• Through experiments

SSTS001 2024
Questions?

SSTS001 2024
SSTS001 2024
Chapter 3

SSTS001 2024
Sampling
Methods

SSTS001 2024
Key Definitions
• A population is the collection of all items of interest or
under investigation
• N represents the population size
• A sample is an observed subset of the population
• n represents the sample size

• A parameter is a specific characteristic of a population


• A statistic is a specific characteristic of a sample

SSTS001 2024
Population vs. Sample

Population Sample

SSTS001 2024
Why Sample?

• Less time consuming than a census

• Less costly to administer than a census

• It is possible to obtain statistical results


of a sufficiently high accuracy based on
samples.

SSTS001 2024
Methods of Sampling

There are two basic methods of


sampling:
• Non-probability sampling methods
• Probability sampling methods

SSTS001 2024
Non-probability
sampling
methods
SSTS001 2024
There are four types of non-probability
sampling methods:
• Convenience sampling
• Judgement sampling
• Quota sampling
• Snowball sampling

SSTS001 2024
Convenience sampling
• When the sampling units are selected to suit
the convenience of the researcher,
convenience sampling has been applied.
Respondents are included in the sample if
they happen to be in the right place at the
right time. Choose individuals for a sample
because they are easy to include.
Example
• Internet Polls
• Mail-In Customer Survey

SSTS001 2024
Judgement sampling
When the researcher uses personal judgment
alone to select whom he or she consider to be
the most appropriate sampling units to include
in the sample in order to provide data to
address the question under study, then
judgment sampling have been applied.
Example
• only labour consultants instead of selecting
from employees in general

SSTS001 2024
Quota sampling
This selection is non-random (selection is bias)
Example
• For example, a researcher might ask for a
sample of 100 females, or 100 individuals
between the ages of 20-30.
• In a study wherein the researcher likes to
compare the academic performance of the
different high school class levels, its
relationship with gender and socioeconomic
status, the researcher first identifies the
subgroups

SSTS001 2024
Snowball sampling
Snowball sampling is used to reach target
populations where the sampling unit are difficult
to identify. Under snowball sampling, each
identified member of the target population is
asked to identify other sampling units who
belong to the same target population. The
issues under investigation are usually
confidential or sensitive in nature.
Example
• Identify Aids sufferers

SSTS001 2024
Probability
sampling
methods
SSTS001 2024
There are four probability based sampling
methods:
• Simple random sampling
• Systematic random sampling
• Stratified random sampling
• Cluster random sampling.

SSTS001 2024
Random Sampling
The sample is chosen as a result of chance
occurrences

Example
• Telephone polling random telephone
numbers
• Drawing names out of a hat

SSTS001 2024
SSTS001 2024
SSTS001 2024
Systematic Sampling
The population is placed on a list, a random starting point is
chosen and then every k-th member is selected.

Example
• Choosing a sample of registered voters by choosing
every 25th voter from the county registration roll
• Testing every 300th product from the assembly line

SSTS001 2024
SSTS001 2024
SSTS001 2024
Stratified Sampling
The population is divided into groups (strata) usually
with meaningful differences, and a sample is chosen
from each group.

Examples
• Choosing 200 men and 200 women for a sample
• Stratify the population by income level and then
choose a sample of low, middle, and high income
individuals

SSTS001 2024
SSTS001 2024
SSTS001 2024
SSTS001 2024
Cluster Sampling

The population is divided into groups in a more or less random way, and then a
sample is chosen by randomly selecting entire groups.

SSTS001 2024
SSTS001 2024
SSTS001 2024
The end of chapter 3

Thank you!!!!!

Questions???????????

SSTS001 2024

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy