Sem 6 - DSV - Unit 4 - Sampling and Estimation
Sem 6 - DSV - Unit 4 - Sampling and Estimation
Sem 6 - DSV - Unit 4 - Sampling and Estimation
Komal Rohit
Asst. Professor, IT Dept., GCET
Sampling
Sampling is a method that allows us to get information about
the population based on the statistics from a subset of the
population (sample), without having to investigate every
individual.
Why do we need Sampling
Sampling is done to draw conclusions about populations
from samples, and it enables us to determine a population’s
characteristics by directly observing only a portion (or
sample) of the population.
For example,
For Example,
To analyse attrition among IT professionals, sources such as LinkedIn and
job portals Naukri and Monster can be used.
However, these frames may not have important variables (features) that
are required such as information related to salary and other data
captured during exit interview.
So, ideally to understand the attrition behavior one has to use the data
captured by many human resource departments across multiple
companies.
Steps involved in Sampling
The sample size for analytics projects is determined using factors such as
effect size, standard deviation, desired level of confidence and margin of
error.
Steps involved in Sampling
An unbiased process
will produce error,
but it is random and
does not tend
strongly in any
direction.
Statistical Bias
Biased process -
there is still random
error in both the x and
y direction, but there is
also a bias.
Non-Probability Sampling:
All elements do not have an equal chance of being selected.
Consequently, there is a significant risk of ending up with a
non-representative sample which does not produce
generalizable results.
Types of Sampling techniques
For example, let’s say our population consists of 20
individuals. Each individual is numbered from 1 to 20 and is
represented by a specific color (red, blue, green, or yellow).
Example -
Say our population size is x and we have to select a sample
size of n.
We can describe the sampling distribution of the mean using this notation:
Where,
X̄ is the sampling distribution of the sample means
~ means “follows the distribution”
N is the normal distribution
µ is the mean of the population
σ is the standard deviation of the population
n is the sample size
Estimation of population parameters
Estimation is a process used for making inferences about
population parameters based on samples.
1. Method of Moments
The basic idea is that you take known facts about the
population, and extend those ideas to a sample.