Chapter one_Sampling
Chapter one_Sampling
1.1. Sampling
Definition: Sampling is a process of selecting a number of individuals for a study in such a way
that the individuals represent the larger group from which they were selected. Sampling is the
method of gathering information about a population by taking a representative of the population
called sample
A sample is selected, evaluated and studied in an effort to gain information about the larger
population from which the sample was drawn. A sample represents a population, and
information obtained from a sample is generalized to be true for the entire population from
which it was drawn. A well-selected sample can provide information comparable to that
obtained by a census. The sampling frame is a list of all elements or other units containing the
elements in a population. Can the data gathered from the sample be used to make inferences
about the population? Statistically speaking, yes.
However, every sample has a different statistic. And this statistic is also considered a random
variable because the data vary from one sample to another.
1
If one studies performance of freshman students in some college, the student is the
sampling unit.
Advantages of sampling
2
1. Sampling error: It is the discrepancy between the population value and sample
value.
May arise due to inappropriate sampling techniques applied
2. Non sampling errors: are errors due to procedure bias such as:
Due to incorrect responses
Measurement
Errors at different stages in processing the data.
1. Simple random sampling: is a method of sampling for which every possible sample has
equal chance of selection. Let n denote the number of subjects in the sample. This
number is called the sample size. A simple random sample of subjects from a population
is one in which each possible sample of that size has the same probability (chance) of
being selected.
Because everyone has the same chance of inclusion in the sample, so it provides fairness.
This reduces the chance that the sample is seriously biased in some way, leading to
inaccurate inferences about the population.
3
Most inferential statistical methods assume randomization of the sort provided by random
sampling.
Random numbers are numbers that are computer generated according to a scheme whereby
each digit is equally likely to be any of the integers 0, 1, 2, …, 9 and does not depend on the
other digits generated.
o The numbers fluctuate according to no set pattern. Any particular digit has the
same chance of being a0, 1, 2, …, or 9.
o The numbers are chosen independently, so any digit chosen has no influence on
any other selection. If the first digit in a row of the t able is 9, for instance, the
next digit is still just as likely to be a 9 as a0 or a1 or any other number.
4
o Random numbers are available in published tables and can be generated with
software and many statistical calculators.
Example: Suppose you want to select a simple random sample of 10 house hold from a total
of 20 house hold to study their socio economic status . The sampling frame is a directory of
these House hold. You can select the house hold by using two digit random numbers to
identify them, as follows:
(1) Assign the numbers 01 to 20 to the house hold in the directory, using 01 for the first
house hold in the list, 02 for the second house hold, and so on.
(2) Starting at any point in the above Table, choose successive two- digit numbers until you
obtain 10 distinct numbers between 01 and 20.
(3) Include in the sample the house hold with the assigned numbers equal to the random
numbers selected.
5
(4) For example, using the first row of the above Table, the first 5 two-digit random numbers
are 10, 15, 01, 02 and 14. Notice that we skipped the numbers which are greater than 20
since no student in the directory has an assigned number greater than these numbers.
(5) After using the first row of the above Table, move to the n row of numbers and continue.
Note: The column (or row) from which you begin selecting the number does not matter,
since the numbers have no set pattern.
6
Example: Suppose we want a systematic random sample of 100 house
hold to study the saving habit of the community from a population of 30 000 house hold
listed in a campus directory. Here, n= 100 and N=30 000, and so k= 30 000 /100 =300.
The population size is 300 times the sample size. Therefore we have to select one of
every 300 students.
We select one house hold at random using every 300 the student after the one selected
randomly. This produces a sample of size 100.
The first three digits in Table the above table from slide 10 are 104, which falls between
001 and 300, so we first select the house hold numbered 104.
The numbers of the other house hold selected are 104+ 300 = 404, 404 + 300 = 704, 704
+ 300 = 1004, 1004 +300 = 1304, and so on. The 100th house hold selected is listed in
the last 300 names in the directory.
7
3. Stratified random sample: Another probability sampling method, useful in social
science research for studies comparing groups, is stratified random sampling.
A stratified random sample divides the population into subgroups called strata, and then
selects a simple random sample from each stratum. Stratified random sampling is a
method of sampling that involves the division of a population into smaller sub-
groups known as strata. Involves dividing the population into groups called STRATA
according to some chosen classification category such as age, gender, geographic
location, and so on. Sub sample from each stratum are selected by simple random
sampling.
Example: Taxpayer A sells three types of computer equipment: laptops, desktops and
network servers. The auditor decides to stratify the total population of sales by type of
computer equipment, since that tends to create more homogeneous sub-populations.
There were $2,000,000 total laptop sales during the period, $3,000,000 total desktop
sales and $5,000,000 server sales. The auditor should allocate the total sample size as
follows: 100 from laptop sales 150 from desktop sales 200 from network sales The results
were as follows:
For example, a population may consist of males and females who are smokers or non
smokers.
The researcher will want to include in the sample people from each group that is, males
who smoke, males who do not smoke, females who smoke, and females who do not
smoke. To accomplish this selection, the researcher divides the population into four
subgroups and then selects a random sample from each subgroup. This method ensures
that the sample is representative on the basis of the characteristics of gender and
smoking. Stratified random sampling is called proportional if the sampled strata
proportions are the same as those in the entire population.
For example, if 90% of the population of interest is men and 10% are women, then the
sampling is proportional if the sample size for men is nine times the sample size for
women. Stratified random sampling is called disproportional if the sampled strata
8
proportion differs from the population proportions. This is useful when the population
size for a stratum is relatively small. A group that comprises a small part of the
population may not have enough representation in a simple random sample to allow
precise inferences.
4. Cluster random sampling: Simple, systematic, and stratified random sampling are often
difficult to implement, because they require a complete sampling frame. Such lists are
easy to obtain when sampling cities or hospitals for example, but more difficult to obtain
when sampling individuals or families. A cluster sample is a sample obtained by selecting
a preexisting or natural group, called a cluster, and using the members in the cluster for
the sample. Cluster samples are essentially strata consisting of geographical regions. We
divide a region (say a city) into sub-regions (say, blocks, subdivisions, or schools).
Cluster sampling is used in large geographic samples where no list is available of all the
units in the population but the population boundaries can be well-defined. Cluster
sampling is useful When
Cluster sampling is cheap and quick, it is often reasonably accurate because people in the
same neighborhood tend to be similar in income, ethnicity, educational background, and so
on.
Example1: The most common cluster used in research is a geographical cluster. For example, a
researcher wants to survey monthly income of house hold in Ethiopia. He can divide the
9
entire population (population of Ethiopia) into different clusters (cities). Then the researcher
selects a number of clusters depending on his research through simple or systematic random
sampling. Then, from the selected clusters (randomly selected cities) the researcher can either
include all the house hold as subjects or he can select a number of subjects from each cluster
through simple or systematic random sampling.
Example2: To in the state and select a simple random sample of school districts. Obtain
information about the drug habits of all high school students in a state, you could obtain a
list of all the school districts. Then, within in each selected school district, list all the high
schools and select a simple random sample of high schools. Within each selected high school,
list all high school classes, and select a simple random sample of classes. Then use the high
school students in those classes as your sample.
Example: What is the difference between a stratified sample and a cluster sample?
Solution: A stratified sample uses every stratum. The strata are usually groups we want to
compare. By contrast, a cluster sample uses a sample of the clusters, rather than all of them. In
cluster sampling, clusters are merely ways of easily identifying groups of subjects. The goal is
not to compare the clusters but to use them to obtain a sample. Most clusters are not represented
in the eventual sample.
The main methods of non-probability sampling are Convenience, Judgmental and Quota
Sampling.
10
group of people easy to contact or to reach. Researchers choose these samples just
because they are easy to recruit, and the researcher did not consider selecting a sample
that represents the entire population.
Example:
11
Example: A researcher wants to survey individuals about what smart phone brand they prefer
to use. He/She consider a sample size of 500 respondents. Also he /she is only interested in
surveying ten states in the US.
Gender: 250 male and 250 female; Age: 100 respondents between ages 16-20, 21-30, 31-
40, 41- 50, 51+
Assignment I
12
1.2 Sampling Distribution
In inferential statistics, we want to use characteristics of the sample (i.e. a statistic) to estimate
the characteristics of the population (i.e. a parameter).
The sampling distribution of X is the probability distribution of all possible values the random
variable may assume when a sample of size n is taken from a specified population.
1. From a finite population of size N, randomly draw all possible samples of size n
What is the sampling distribution of the sample means for samples of size 2?
b. To get sampling distribution of the sample mean, use the combination rule NCn= 5C2= 10.
Then 10 distinct sample means from all possible samples of 2 that can be drawn from the
population.
13
c. The mean of sampling distribution
14
Relationships between Population Parameters and the Sampling Distribution of the Sample
Mean
The expected value of the sample mean is equal to the population mean:
E( X ) X X
The variance of the sample mean is equal to the population variance divided by the sample
size:
X2
V (X ) 2
X
n
The standard deviation of the sample mean, known as the standard error of the mean, is
equal to the population standard deviation divided by the square root of the sample size:
X
SD( X ) X
n
Assignment II
2. Discuss the sampling techniques you have assigned (you have chosen)
3. By taking real life problem (in your field of study), discuss how to select sample using the
sampling technique you have chosen
15