Determining The Sample Size (Continuous Data)
Determining The Sample Size (Continuous Data)
Before you can calculate a sample size, you need to determine a few things about the target population
and the sample you need:
1. Population Size — how many total people fit your demographic? For instance, if you want to
know about mothers living in the US, your population size would be the total number of mothers
living in the US. Don’t worry if you are unsure about this number. It is common for the
population to be unknown or approximated.
2. Margin of Error (Confidence Interval) — No sample will be perfect, so you need to decide how
much error to allow. The confidence interval determines how much higher or lower than the
population mean you are willing to let your sample mean fall. If you’ve ever seen a political poll
on the news, you’ve seen a confidence interval. It will look something like this: “68% of voters
said yes to Proposition Z, with a margin of error of +/- 5%.”
3. Confidence Level — How confident do you want to be that the actual mean falls within your
confidence interval? The most common confidence intervals are 90% confident, 95% confident,
and 99% confident.
When X is normally distributed, the range of values between X Bar ±1.96σ is called the 95%
confidence interval for µ. The two boundaries of the interval, X Bar−1.96σ and X Bar +1.96σ are
called the 95% confidence limits. That is, there is a 95% chance that the following statement will
we true:
X Bar −1.96σ ≤ µ ≤ X Bar +1.96σ
4. Standard of Deviation — How much variance do you expect in your responses? Since we haven’t
actually administered our survey yet, the safe decision is to use .5 – this is the most forgiving
number and ensures that your sample will be large enough.
Okay, now that we have these values defined, we can calculate our needed sample size.
Your confidence level corresponds to a Z-score. This is a constant value needed for this equation. Here
are the z-scores for the most common confidence levels:
● Z is the value from the table of probabilities of the standard normal distribution for the desired
confidence level (e.g., Z = 1.96 for 95% confidence)
● E is the margin of error that the investigator specifies as important from a clinical or practical
standpoint.
● σ is the standard deviation of the outcome of interest.
Example 1: We would like to start an ISP and need to estimate the average Internet usage of
households in one week for our business plan and model. How many households must we randomly
select to be 95 percent sure that the sample mean is within 1 minute of the population mean .
Assume that a previous survey of household usage has shown = 6.95 minutes.
Solution
So we will need to sample at least 186 (rounded up) randomly selected households. With this sample we
will be 95 percent confident that the sample mean will be within 1 minute of the true population of
Internet usage.
Example 2. For a soft drink bottle filling plant there are concerns raised about inconsistency of
concentrate filled per bottle. You wish to estimate the average concentrate filled per bottle to avoid
legal implications. How many minimum bottle must be sampled randomly to be 95% sure that the
mean of the sample is within 0.5 ml of the population mean . Historical evidence shows that =
3.75 ml.
Solution
You must audit a min of 217 bottles to estimate the average concentrate filled per bottle at 95%
confidence level wherein the sample mean shall fall within +/- 0.5ml of the true population.
Example 3. What is the minimum sample size required at 95% confidence level to estimate mean
waiting time (in min) for the patients at Zampa Healthcare? Randomly 200 samples were picked up
for historical waiting time data in order to estimate the population standard deviation of waiting time
which was recorded as 6.5 minutes. The process excellence decides to work with +/- 1 min margin of
error to estimate 95% confidence interval range for population mean waiting time.
Solution