0% found this document useful (0 votes)
23 views43 pages

RM Unit-II

The document discusses the concepts of census and sampling in statistical practice, highlighting the importance of defining populations and the advantages and disadvantages of each method. It outlines the steps in the sampling process, essential characteristics of good sampling, and various sampling techniques, including probability and non-probability sampling. Additionally, it explains the differences between sampling and non-sampling errors, emphasizing the significance of representative samples for accurate data collection.

Uploaded by

Bhat Showkat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views43 pages

RM Unit-II

The document discusses the concepts of census and sampling in statistical practice, highlighting the importance of defining populations and the advantages and disadvantages of each method. It outlines the steps in the sampling process, essential characteristics of good sampling, and various sampling techniques, including probability and non-probability sampling. Additionally, it explains the differences between sampling and non-sampling errors, emphasizing the significance of representative samples for accurate data collection.

Uploaded by

Bhat Showkat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

INTRODUCTION:

Successful statistical practice is based on focused problem definition. In sampling,


this includes defining the population from which the sample is drawn. Population is
also known as a well-defined collection of individuals or objects known to have similar
characteristics. It is the aggregate of all elements, usually defined prior to the selection
of the sample. The population is said to be completely defined if at least the element,
sampling units, extent and time are specified.
A complete enumeration of all items in the ‘population’ is known as a census inquiry.
It is the process of obtaining responses from/about each of the members of the
population. It can be presumed that in such an inquiry when all items are covered, no
element of chance is left and highest accuracy is obtained, but in practice this may not
be true. Even the slightest element of bias in such an inquiry will larger and larger as
the number of observation increases. Moreover, there is no way of checking the
element of bias or its extent except through a resurvey or use of sample checks.
A Sample is a subset of a population that is used to represent the entire group as a
whole. Sample is the selection of a part of the universe for the purpose of drawing
conclusion or inference about the entire universe from the study of these parts. In
certain circumstances the sampling may represent the only possible or practicable
method to obtain desired information. For example if the universe is infinite or very
large or complex. The sample is only way to study. It is said that a carefully designed
sample may better than a poorly planned and executed census. In case of sampling, the
cost of data collection is much less than census.
Census and sample
Studying a population

A population may be studied using one of two approaches: taking a census, or selecting a sample.

It is important to note that whether a census or a sample is used, both provide information that can be used to
draw conclusions about the whole population.

Census: complete enumeration


A census is a study of every unit, everyone or everything, in a population. It is known as a complete
enumeration, which means a complete count.

Sample: partial enumeration

A sample is a subset of units in a population, selected to represent all units in a population of interest. It is a
partial enumeration because it is a count from part of the population.

Information from the sampled units is used to estimate the characteristics for the entire population of interest.

Choosing between a census or a sample

Once a population has been identified a decision needs to be made about whether taking a census or selecting a
sample will be the more suitable option. There are advantages and disadvantages to using a census or sample
to study a population:

Advantages and disadvantages of taking a census or sample


Advantages of a census Disadvantages of a census
 provides a true measure of the population (no
 may be difficult to enumerate all units of the
sampling error)
population within the available time
 benchmark data may be obtained for future
 higher costs, both in staff and monetary terms,
studies
than for a sample
 detailed information about small sub-groups
 generally takes longer to collect, process, and
within the population is more likely to be
release data than from a sample
available

Advantages of a sample Disadvantages of a sample


 costs would generally be lower than for a  data may not be representative of the total
census population, particularly where the sample size
 results may be available in less time is small

 if good sampling techniques are used, the  often not suitable for producing benchmark
results can be very representative of the data
actual population  as data are collected from a subset of units and
inferences made about the whole population,
the data are subject to 'sampling' error

 decreased number of units will reduce the


detailed information available about sub-
groups within a population

Census is the process of collecting data from every member of a population, while sampling is the process of
collecting data from a subset of a population. In a census, every member of a population is included, while in
sampling, a smaller group of individuals is selected to represent the population as a whole.

Census Sampling
Involves collecting data from every single member Involves collecting data from a subset or a
of a population selected group of the population
Requires a large amount of resources and time to Requires fewer resources and is quicker to
conduct the survey and gather data conduct as it only involves a specific group of the
population
Provides a complete and accurate representation of Provides an estimate or a general idea of the
the population as it covers all the members population based on the sample selected
Can be more expensive than sampling as it involves Is generally less expensive than a census as it only
collecting data from every member of the involves a specific group of the population
population
Can be useful for small populations or when Can be useful for large populations or when a
detailed information is needed about the whole general overview is needed about the population
population
The margin of error is typically very small as it The margin of error is typically larger than a
covers the whole population census as the sample size is smaller than the
population size
Sample: The sample is a part or a small section of the population selected for study.
Sampling: It is the procedure of selecting a sample from the population.
Sampling Frame: A set of information used to identify a sample population for
statistical treatment. A sampling frame includes a numerical identifier for each
individual, plus other identifying information about characteristics of the individuals, to
aid in analysis and allow for division into further frames for more in-depth analysis.
ESSENTIALS OF GOOD SAMPLING:
In order to reach at right conclusions, a sample must possess the following
essential characteristics.
1. Representative: The sample should truly represent the characteristics of the universe. For this
investigator should be free from bias and the method of collection should be appropriate.
2. Adequacy: The size of the sample should be adequate i.e., neither too large nor small but
commensurate with the size of the population.
3. Homogeneity: There should be homogeneity in the nature of all the units selected for the
sample. If the units of the sample are of heterogeneous character it will be impossible to make a
comparative study with them.
4. Independent ability: The method of selection of the sample should be such that the items of
the sample are selected in an independent manner. This means that selection of one item
should not influence the selection of another item in any manner and that each item should
be selected on the basis of its own merit.
NEED OF SAMPLING:
Sampling is necessary because of the following reasons
• It is not technically or economically feasible to take the entire population into
consideration.
• Due to dynamic changes in business, industrial and social environment, it is
necessaryto make quick decision, so sample is necessary to save time.
• If data collection takes a long time, then value of some characteristics may
change over the period of time thus, defeating the very purpose of data analysis.
Thus, due to importance of time element sample is required.
• Sample, if representative, may yield more accurate results than the total census
because sample can be more accurately supervised.
STEPS IN SAMPLING PROCESS:
The steps involved in the sampling procedure are as follows:

1. Identify the population of interest(Target population) :

 Target population refers to the group of individuals or objects to which researchers are interested in
generalizing their findings.
 A well-defined population reduces the likelihood of undesirable individuals or objects. A sample is
taken from the target population.

2. Select a sampling frame:

 The sampling frame is the group of individuals or objects from which the researcher will draw the
sample.
 It is the actual list of all units in a target population from which the sample is taken.

3. Specify the sampling technique:

 Sampling can be done by two techniques: probability (random selection) or non-probability (non-
random) technique.
 Now, if the sampling frame is approximately the same as the target population, random selection may
be used to select samples.

4. Determine the sample size:

 The sample size is defined as the number of units in the sample. Sample size determination depends on
many factors such as time, cost, and facility.

5. Execute the sampling plan:

 Once population, sampling frame, sampling technique, and sample size are identified, the researcher
can use all that information to execute the sampling plan and collect the data required for the research.

METHODS OR TECHNIQUES OF SAMPLING:


In Statistics, the sampling method or sampling technique is the process of studying the population by
gathering information and analyzing that data. It is the basis of the data where the sample space is enormous.
There are several different sampling techniques available, and they can be subdivided into two groups. All
these methods of sampling may involve specifically targeting hard or approach to reach groups.

Types of Sampling Method

In Statistics, there are different sampling techniques available to get relevant results from the population. The
two different types of sampling methods are::

 Probability Sampling
 Non-probability Sampling

What is Probability Sampling?

The probability sampling method utilizes some form of random selection. In this method, all the eligible
individuals have a chance of selecting the sample from the whole sample space. This method is more time
consuming and expensive than the non-probability sampling method. The benefit of using probability
sampling is that it guarantees the sample that should be the representative of the population.

Probability Sampling Types

Probability Sampling methods are further classified into different types, such as simple random sampling,
systematic sampling, stratified sampling, and clustered sampling. Let us discuss the different types of
probability sampling methods along with illustrative examples here in detail.

Simple Random Sampling

In simple random sampling technique, every item in the population has an equal and likely chance of being
selected in the sample. Since the item selection entirely depends on the chance, this method is known as
“Method of chance Selection”. As the sample size is large, and the item is chosen randomly, it is known as
“Representative Sampling”.

Example:
Suppose we want to select a simple random sample of 200 students from a school. Here, we can assign a
number to every student in the school database from 1 to 500 and use a random number generator to select a
sample of 200 numbers.

Systematic Sampling

In the systematic sampling method, the items are selected from the target population by selecting the random
selection point and selecting the other methods after a fixed sample interval. It is calculated by dividing the
total population size by the desired population size.

Example:

Suppose the names of 300 students of a school are sorted in the reverse alphabetical order. To select a sample
in a systematic sampling method, we have to choose some 15 students by randomly selecting a starting
number, say 5. From number 5 onwards, will select every 15th person from the sorted list. Finally, we can end
up with a sample of some students.

Stratified Sampling

In a stratified sampling method, the total population is divided into smaller groups to complete the sampling
process. The small group is formed based on a few characteristics in the population. After separating the
population into a smaller group, the statisticians randomly select the sample.

For example, there are three bags (A, B and C), each with different balls. Bag A has 50 balls, bag B has 100
balls, and bag C has 200 balls. We have to choose a sample of balls from each bag proportionally. Suppose 5
balls from bag A, 10 balls from bag B and 20 balls from bag C.

Clustered Sampling

In the clustered sampling method, the cluster or group of people are formed from the population set. The group
has similar significatory characteristics. Also, they have an equal chance of being a part of the sample. This
method uses simple random sampling for the cluster of population.

Example:
An educational institution has ten branches across the country with almost the number of students. If we want
to collect some data regarding facilities and other things, we can’t travel to every unit to collect the required
data. Hence, we can use random sampling to select three or four branches as clusters.

All these four methods can be understood in a better manner with the help of the figure given below. The
figure contains various examples of how samples will be taken from the population using different techniques.

What is Non-Probability Sampling?

The non-probability sampling method is a technique in which the researcher selects the sample based on
subjective judgment rather than the random selection. In this method, not all the members of the population
have a chance to participate in the study.

Non-Probability Sampling Types

Non-probability Sampling methods are further classified into different types, such as convenience sampling,
consecutive sampling, quota sampling, judgmental sampling, snowball sampling. Here, let us discuss all these
types of non-probability sampling in detail.

Convenience Sampling

In a convenience sampling method, the samples are selected from the population directly because they are
conveniently available for the researcher. The samples are easy to select, and the researcher did not choose the
sample that outlines the entire population.

Example:

In researching customer support services in a particular region, we ask your few customers to complete a
survey on the products after the purchase. This is a convenient way to collect data. Still, as we only surveyed
customers taking the same product. At the same time, the sample is not representative of all the customers in
that area.

Consecutive Sampling

Consecutive sampling is similar to convenience sampling with a slight variation. The researcher picks a single
person or a group of people for sampling. Then the researcher researches for a period of time to analyze the
result and move to another group if needed.
Quota Sampling

In the quota sampling method, the researcher forms a sample that involves the individuals to represent
the population based on specific traits or qualities. The researcher chooses the sample subsets that bring the
useful collection of data that generalizes the entire population.

Purposive or Judgmental Sampling

In purposive sampling, the samples are selected only based on the researcher’s knowledge. As their knowledge
is instrumental in creating the samples, there are the chances of obtaining highly accurate answers with a
minimum marginal error. It is also known as judgmental sampling or authoritative sampling.

Snowball Sampling

Snowball sampling is also known as a chain-referral sampling technique. In this method, the samples have
traits that are difficult to find. So, each identified member of a population is asked to find the other sampling
units. Those sampling units also belong to the same targeted population.

Probability Sampling Vs Non-Probability Sampling Methods

The below table shows a few differences between probability sampling methods and non-probability sampling
methods.

Probability Sampling Methods Non-probability Sampling Methods

Probability Sampling is a sampling technique


Non-probability sampling method is a technique in which
in which samples taken from a larger
the researcher chooses samples based on subjective
population are chosen based on probability
judgment, preferably random selection.
theory.

These are also known as Random sampling


These are also called non-random sampling methods.
methods.

These are used for research which is


These are used for research which is exploratory.
conclusive.
These involve a long time to get the data. These are easy ways to collect the data quickly.

There is an underlying hypothesis in


probability sampling before the study starts. The hypothesis is derived later by conducting the research
Also, the objective of this method is to study in the case of non-probability sampling.
validate the defined hypothesis.
SAMPLING AND NON-SAMPLING ERRORS:

The key difference between sampling and non-sampling error is that sampling error is the error
that arises from taking a sample from a larger population, while non-sampling error is error that
arises from other sources, such as errors in data collection or data entry. Sampling error is the
deviation of the selected sample from the true characteristics, traits, behaviours, qualities or
figures of the entire population. Various forces combine to produce deviations of sample
statistic from population parameters, and errors, in accordance with the different cause, are
classified into sampling and non-sampling error.
What are Sampling Errors?

Sampling errors are statistical errors that arise when a sample does not represent the whole population. They
are the difference between the real values of the population and the values derived by using samples from the
population. Sampling errors occur when numerical parameters of an entire population are derived from a
sample of the entire population. Since the whole population is not included in the sample, the parameters
derived from the sample differ from those of the actual population.

Sampling Errors Explained

Sampling errors are deviations in the sampled values from the values of the true population emanating from
the fact that a sample is not an actual representative of a population of data.

Since there is a fault in the data collection, the results obtained from sampling become invalid. Furthermore,
when a sample is selected randomly, or the selection is based on bias, it fails to denote the whole population,
and sampling errors will certainly occur.

They can be prevented if the analysts select subsets or samples of data to represent the whole population
effectively. Sampling errors are affected by factors such as the size and design of the sample,
population variability, and sampling fraction.

Increasing the size of samples can eliminate sampling errors. However, to reduce them by half, the sample size
needs to be increased by four times. If the selected samples are small and do not adequately represent the
whole data, the analysts can select a greater number of samples for satisfactory representation.
The population variability causes variations in the estimates derived from different samples, leading to larger
errors. The effect of population variability can be reduced by increasing the size of the samples so that these
can more effectively represent the population.

Moreover, sampling errors must be considered when publishing survey results so that the accuracy of the
estimates and the related interpretations can be established.

Practical Example

Suppose the producers of Company XYZ want to determine the viewership of a local program that airs twice a
week. The producers will need to determine the samples that can represent various types of viewers. They may
need to consider factors like age, level of education, and gender.

For example, people between the ages of 14 and 18 usually have fewer commitments, and most of them can
spare time to watch the program twice weekly. On the contrary, people between the age of 18 and 35 usually
have tighter schedules and will not have time to watch TV.

Hence, it is important to draw a sample proportionately. Otherwise, the results will not represent the real
population.

Since the exact population parameter is not known, sampling errors for samples are generally unknown.
However, analysts can use analytical methods to measure the amount of variation caused by sampling errors.
Categories or Reasons of Sampling Errors

 Population Specification Error – Happens when the analysts do not understand who to survey. For
example, for a survey of breakfast cereals, the population can be the mother, children, or the entire
family.
 Selection Error – Occurs when the respondents’ survey participation is self-selected, implying only
those who are interested respond. Selection errors can be reduced by encouraging participation.
 Sample Frame Error – Occurs when a sample is selected from the wrong population data.
 Non-Response Error – Occurs when a useful response is not obtained from the surveys. It may
happen due to the inability to contact potential respondents or their refusal to respond.

NON-SAMPLING ERROR

Non-sampling error refers to all sources of error that are unrelated to sampling. Non-sampling errors are
present in all types of survey, including censuses and administrative data. They arise for a number of reasons:
the frame may be incomplete, some respondents may not accurately report data, data may be missing for some
respondents, etc.

Non-sampling errors can be classified into two groups: random errors and systematic errors.

 Random errors are errors whose effects approximately cancel out if a large enough sample is used,
leading to increased variability.
 Systematic errors are errors that tend to go in the same direction, and thus accumulate over the entire
sample leading to a bias in the final results. Unlike random errors, this bias is not reduced by increasing
the sample size. Systematic errors are the principal cause of concern in terms of a survey’s data quality.
Unfortunately, non-sampling errors are often extremely difficult, if not impossible, to measure.

Types of non-sampling error

Non-sampling error can occur in all aspects of the survey process, and can be classified into the following
categories: coverage error, measurement error, non-response error and processing error.
Coverage error

Coverage error consists of omissions (undercoverage), erroneous inclusions, duplications and


misclassifications (overcoverage) of units in the survey frame. Since it affects every estimate produced by the
survey, they are one of the most important types of error. In the case of a census, it may be the main source of
error. Coverage error can have both spatial and temporal dimensions, and may cause bias in the estimates. The
effect can vary for different subgroups of the population. This error tends to be systematic and is usually due to
under coverage, which is why it’s important to reduce it as much as possible.

Measurement error

Measurement error, also called response error, is the difference between measured values and true values. It
consists of bias and variance, and it results when data are incorrectly requested, provided, received or
recorded. These errors may occur because of inefficiencies with the questionnaire, the interviewer, the
respondent or the survey process.

 Poor questionnaire design: It is essential that sample survey or census questions are worded carefully
in order to avoid introducing bias. If questions are misleading or confusing, then the responses may end
up being distorted.
 Interviewer bias: An interviewer can influence how a respondent answers the survey questions. This
may occur when the interviewer is too friendly or aloof or prompts the respondent. To prevent this,
interviewers must be trained to remain neutral throughout the interview. They must also pay close
attention to the way they ask each question. If an interviewer changes the way a question is worded, it
may impact the respondent’s answer.
 Respondent error: Respondents can also provide incorrect answers. Faulty recollections, tendencies to
exaggerate or underplay events, and inclinations to give answers that appear more socially acceptable
are several reasons why a respondent may provide a false answer.
 Problems with the survey process: Errors can also occur because of a problem with the actual survey
process. Using proxy responses, meaning taking answers from someone other than the respondent, or
lacking control over the survey procedures are just a few ways of increasing the risk of response errors.

Non-response error

Estimates obtained after nonresponse has been observed and imputation has been used to deal with this
nonresponse are usually not equivalent to the estimates that would have been obtained had all the desired
values been observed without error. The difference between these two types of estimates is called
the nonresponse error. There are two types of non-response errors: total and partial.

 Total nonresponse error occurs when all or almost all data for a sampling unit are missing. This can
happen if the respondent is unavailable or temporarily absent, the respondent is unable to participate or
refuses to participate in the survey, or if the dwelling is vacant. If a significant number of sampled units
do not respond to a survey, then the results may be biased since the characteristics of the non-
respondents may differ from those who have participated.
 Partial nonresponse error occurs when respondents provide incomplete information. For certain
people, some questions may be difficult to understand, they may refuse or forget to answer a question.
Poorly designed questionnaire or poor interviewing techniques can also be reasons which result partial
nonresponse error. To reduce this form of error, care should be taken in designing and testing
questionnaires. Adequate interviewer training and appropriate edit and imputation strategies will also
help minimize this error.

Processing error

Processing error occurs during data processing. It includes all data processing activities after collection and
prior to estimation, such as errors in data capture, coding, editing and tabulation of the data as well as in the
assignment of survey weights.

 Coding errors occur when different coders code the same answer differently, which can be caused by
poor training, incomplete instructions, variance in coder performance (i.e. tiredness, illness), data entry
errors, or machine malfunction (some processing errors are caused by errors in the computer
programs).
 Data capture errors result when data are not entered into the computer exactly as they appear on the
questionnaire. This can be caused by the complexity of alphanumeric data and by the lack of clarity in
the answer provided. The physical layout of the questionnaire itself or the coding documents can cause
data capture errors. The method of data capture, manual or automated (for example, using an optical
scanner), can also result in errors.
 Editing and imputation errors can be caused by the poor quality of the original data or by its
complex structure. When the editing and imputation processes are automated, errors can also be the
result of faulty programs that were insufficiently tested. The choice of an inappropriate imputation
method can introduce bias. Errors can also result from incorrectly changing data that were found to be
in error, or by erroneously changing correct data.

DETERMINATION OF SAMPLE SIZE

What is Sample Size?

‘Sample size’ is a market research term used for defining the number of individuals included in conducting
research. Researchers choose their sample based on demographics, such as age, gender questions, or physical
location. It can be vague or specific.

For example, you may want to know what people within the 18-25 age range think of your product. Or, you
may only require your sample to live in the United States, giving you a wide population range. The total
number of individuals in a particular sample is the sample size.

What is sample size determination?

Sample size determination is the process of choosing the right number of observations or people from a larger
group to use in a sample. The goal of figuring out the sample size is to ensure that the sample is big enough to
give statistically valid results and accurate estimates of population parameters but small enough to be
manageable and cost-effective.

In many research studies, getting information from every member of the population of interest is not possible
or useful. Instead, researchers choose a sample of people or events that is representative of the whole to study.
How accurate and precise the results are can depend a lot on the size of the sample.

Choosing the statistically significant sample size depends on a number of things, such as the size of the
population, how precise you want your estimates to be, how confident you want to be in the results, how
different the population is likely to be, and how much money and time you have for the study. Statistics are
often used to figure out how big a sample should be for a certain type of study and research question.

Figuring out the sample size is important in ensuring that research findings and conclusions are valid and
reliable.

Why do you need to determine the sample size?


Let’s say you are a market researcher in the US and want to send out a survey or questionnaire. The survey
aims to understand your audience’s feelings toward a new cell phone you are about to launch. You want to
know what people in the US think about the new product to predict the phone’s success or failure before
launch.

Hypothetically, you choose the population of New York, which is 8.49 million. You use a sample size
determination formula to select a sample of 500 individuals that fit into the consumer panel requirement. You
can use the responses to help you determine how your audience will react to the new product.

However, determining a sample size requires more than just throwing your survey at as many people as
possible. If your estimated sample sizes are too big, it could waste resources, time, and money. A sample size
that’s too small doesn’t allow you to gain maximum insights, leading to inconclusive results.

What are the terms used around the sample size?

Before we jump into sample size determination, let’s take a look at the terms you should know:

1. Population size:

Population size is how many people fit your demographic. For example, you want to get information on
doctors residing in North America. Your population size is the total number of doctors in North
America. Don’t worry! Your population size doesn’t always have to be that big. Smaller population sizes can
still give you accurate results as long as you know who you’re trying to represent.

2. Confidence level:

The confidence level tells you how sure you can be that your data is accurate. It is expressed as a percentage
and aligned to the confidence interval. For example, if your confidence level is 90%, your results will most
likely be 90% accurate.

3. The margin of error (confidence interval):

There’s no way to be 100% accurate when it comes to surveys. Confidence intervals tell you how far off from
the population means you’re willing to allow your data to fall.
A margin of error describes how close you can reasonably expect a survey result to fall relative to the real
population value. Remember, if you need help with this information, use our margin of error calculator.

4. Standard deviation:

Standard deviation is the measure of the dispersion of a data set from its mean. It measures the absolute
variability of a distribution. The higher the dispersion or variability, the greater the standard deviation and the
greater the magnitude of the deviation.

For example, you have already sent out your survey. How much variance do you expect in your responses?
That variation in response is the standard deviation.

WHY DOES SAMPLE SIZE MATTER?

When you survey a large population of respondents, you’re interested in the entire group, but it’s not
realistically possible to get answers or results from absolutely everyone. So you take a random sample of
individuals which represents the population as a whole. The size of the sample is very important for getting
accurate, statistically significant results and running your study successfully.

 If your sample is too small, you may include a disproportionate number of individuals which are
outliers and anomalies. These skew the results and you don’t get a fair picture of the whole population.

 If the sample is too big, the whole study becomes complex, expensive and time-consuming to run, and
although the results are more accurate, the benefits don’t outweigh the costs.

Sample size calculation formula – sample size determination

With all the necessary terms defined, it’s time to learn how to determine sample size using a sample
calculation formula.

Your confidence level corresponds to a Z-score. This is a constant value needed for this equation. Here are the
z-scores for the most common confidence levels:

90% – Z Score = 1.645


95% – Z Score = 1.96

99% – Z Score = 2.576

If you choose a different confidence level, various online tools can help you find your score.

Necessary Sample Size = (Z-score)2 * StdDev*(1-StdDev) / (margin of error)2

Here is an example of how the math works, assuming you chose a 90% confidence level, .6 standard deviation,
and a margin of error (confidence interval) of +/- 4%.

((1.64)2 x .6(.6)) / (.04)2

( 2.68x .0.36) / .0016

.9648 / .0016

=603

603 respondents are needed, and that becomes your sample size.

DATA COLLECTION METHODS:

INTRODUCTION:

Once the researcher has decided the research design, the next job is of data collection. Data
collection starts with determining what kind of data required followed by the selection of a
sample from a certain population. The reliability of managerial decisions depends on the quality
of data. The quality of data can be expressed in terms of its representative feature of the
reality which can be ensured by the usage of a fitting data collection methods. Depending upon
the source utilized, whether the data has come from actual observations or from records that are
kept for normal purposes, statistical data can be classified into two categories i.e primary and
secondary:

PRIMARY DATA:

Primary data is a type of information that is obtained directly from first-hand sources by
means of surveys, observation or experimentation. It is data that has not been previously
published and is derived from a new or original research study and collected at the source
such as in marketing.

ADVANTAGES OF PRIMARY DATA:

Advantages of primary data are as follows:

• The primary data are original and relevant to the topic of the research study so
the degree of accuracy is very high.

• Primary data is that it can be collected from a number of ways like interviews,
telephone surveys, focus groups etc. It can be also collected across the
national borders through emails and posts. It can include a large population and
wide geographical coverage.

• Moreover, primary data is current and it can better give a realistic view to the
researcher about the topic under consideration.

• Reliability of primary data is very high because these are collected by the
concerned and reliable party.

DISADVANTAGES OF PRIMARY DATA:

Following are the disadvantages of primary data:


• For collection of primary data where interview is to be conducted the coverage
is limited and for wider coverage a more number of researchers are required.

• A lot of time and efforts are required for data collection. By the time the data
collected, analyzed and report is ready the problem of the research becomes
very serious or out dated. So the purpose of the research may be defeated.

• It has design problems like how to design the surveys. The questions must be
simpleto understand and respond.

• Some respondents do not give timely responses. Sometimes, the respondents


may give fake, socially acceptable and sweet answers and try to cover up the
realities. With more people, time and efforts involvement the cost of the data
collection goes high. The importance of the research may go down.

• In some primary data collection methods there is no control over the data
collection. Incomplete questionnaire always give a negative impact on research.

METHODS OF PRIMARY DATA COLLECTION:

Observation Method

In the observation method, the investigator will collect data through personal observations. In
this method the investigator will observe the behaviour of the respondents in disguise. Under
observation method information is sought by way of investigators own direct observation
without asking from the respondents.

Advantages of Observation

1. Very direct method for collecting data or information – best for the study of
human behaviour.

2. Data collected is very accurate in nature and also very reliable.

3. Improves precision of the research results.

4. Problem of depending on respondents is decreased.

5. Helps in understanding the verbal response more efficiently.

6. Observation is less demanding in nature, which makes it less bias in working abilities.
Disadvantages of Observation

• Problems of the past cannot be studied by means of observation.

• Having no other option one has to depend on the documents available.

• Observations like the controlled observations require some especial


instruments ortools for effective working, which are very much costly.

• Attitudes cannot be studied with the help of observations.

• Complete answer to any problem or any issue cannot be obtained by


observation alone.

Questionnaire Method

The questionnaire is most frequently a very concise, preplanned set of questions


designed to yield specific information to meet a particular need for research
information about a pertinent topic. The research information is attained from
respondents normally from a related interest area. A questionnaire is a written or printed
form used in gathering information on some subject or subjects consisting of a list of
questions to be submitted to one or more persons. As a data collecting instrument, it
could be structured, unstructured or semi- structured questionnaire.
A structured questionnaire, is one in which the questions asked are precisely
decided in advance. When used as an interviewing method, the questions are asked
exactly as they are written, in the same sequence, using the same style, for all
interviews. Nonetheless, the structured questionnaire can sometimes be left a bit open
for the interviewer to amend to suit a specific context.

An unstructured questionnaire on the other hand, is an instrument or guide used by


an interviewer who asks questions about a particular topic or issue. Although a
question guide is provided for the interviewer to direct the interview, the specific
questions and the sequence in which they are asked are not precisely determined in
advance.

A semi- structured questionnaire is a mix of unstructured and structured


questionnaires. Some of the questions and their sequence are determined in advance,
while others evolve as the interview proceeds.

Advantages of Questionnaires

• It is free from the bias of the interviewer; answers are in respondents own words.

• Respondents have adequate time to give well thought out answer.

• There is low cost even when the universe is large and is widely spread geographically.

• Respondents, who are not easily approachable, can also be reachable conveniently.

• Large samples can be made use of and thus the results can be made more
dependable and reliable.

Disadvantage of Questionnaire

• It can be used only when respondents are educated and cooperative.

• The control over questionnaire may be lost once it is sent.


• There is inbuilt inflexibility because of the difficulty of amending the
approach once questionnaire have been dispatched.

• It is difficult to know whether willing respondents are truly representative.

• This method is likely to be the slowest of all.

Interview Method

In this method the interviewer personally meets the informants and asks
necessary questions to them regarding the subject of enquiry. Usually a set of questions
or a questionnaire is carried by him and questions are also asked according to that.
The interviewer efficiently collects the data from the informants by cross examining
them. The interviewer must be very efficient and tactful to get the accurate and relevant
data from the informants. Interviews like personal interview/depth interview or
telephone interview can be conducted as per the need of the study.

This method acts as a very vital tool for the collection of the data in the social
research as it is all about the direct systematic conversation between an interviewer and
the respondent. By this the interviewer is able to get relevant information for a
particular research problem.

Personal interviews

Under this method there is face to face contact between the interviewer and the
interviewee. This sort of interview may be in the form of direct personal investigation or
it
may be indirect oral investigation. In the case of direct personal investigation the
interviewer has to collect the information personally from the sources concerned. He
has to be on the spot and has to meet people from whom data have to be collected. This
method is particularly suitable for intensive investigations.

Telephone interview

Telephone interview involves trained interviewers phoning people to collect


questionnaire data. This method is quicker and less expensive than face-to-face
interviewing. However, only people with telephones can be interviewed, and the
respondent can end the interview very easily

Advantages of the Interview Method

• Very good technique for getting the information about the complex,
emotionally laden subjects.

• Can be easily adapted to the ability of the person being interviewed.

• Yields a good percentage of returns

• Yields perfect sample of the general population.

• Data collected by this method is likely to be more correct compared to


the othermethods that are used for the data collection.

Disadvantages of the Interview Method

• Time consuming process.

• Involves high cost.

• Requires highly skilled interviewer.

• Requires more energy


• More confusing and a very complicated method.

Mail Survey

It is a common method of conducting surveys. It is a relatively inexpensive


method of collecting data, and one that can distribute large numbers of questionnaires in
a short time. It provides the opportunity to contact hard-to-reach people, and
respondents are able to complete the questionnaire in their own time. Mail surveys do
require an up-to-date list of names and addresses. In addition, there is also the need to
keep the questionnaire simple and
Straight forward. A major disadvantage of a mail survey is that it usually has lower response
rates than other data collection methods. This may lead to problems with data quality.

Collection of Data through Schedule


This method of data collection is very much like the collection of data through questionnaire,
with little difference which lies in the fact that schedules are being filled by the enumerators
who are specially appointed for the purpose. These enumerators along with schedules go to
respondents, put to them the questions from the preformed in the order the questions are listed
and record the replies in the space meant for the same in the proforma. Enumerators explain the
aims and objects of the investigation and also remove the difficulties which any respondent
may feel in understanding the implications of a particular question.

This method of data collection is very useful in extensive enquiries and can lead to fairly
reliable results. It is, however, very expensive and is usually adopted in investigations
conducted by governmental agencies or by some big organizations.

Difference between Schedule and Questionnaire

S. No Questionnaire Schedule
1. Questionnaire is generally sent through mail to A schedule is generally filled by the
informants to be answered as specified in a research worker or enumerator,
covering letter, but otherwise without further who can interpret the questions
assistance from the sender. when necessary.
2. Data collection is cheap and economical as the Data collection is more expensive
money is spent in preparation of questionnaire as money is spent on enumerators
and in mailing the same to respondents. and in imparting trainings to
them. Money is also spent in
preparing schedules.
3. Non response is usually high as many people Non response is very low because
do not respond and many return the questionnaire this is filled by enumerators who
without answering all questions. Bias due to non- are able to get answers to all
response often remains indeterminate. questions. But even in this, there
remains the danger of interviewer
bias and cheating.
4. It is not clear that who replies. Identity of respondent is known.
5. The questionnaire method is Information is collected well in time as they
likely to be very slow since many are filled by enumerators.
respondents do not return the
questionnaire.
6. No personal contact is possible in Direct personal contact is established
case of questionnaire as the
questionnaires are sent to respondents
by post who also in turn returns the
same by post.
7. This method can be used only when The information can be gathered even when
respondents are literate and the respondents happen to be illiterate.
cooperative.
8. Wider and more representative There remains the difficulty in sending
distribution of sample is possible. enumerators over a relatively wider area.
9. Risk of collecting incomplete and The information collected is generally
wrong information is relatively more complete and accurate as enumerators can
under the questionnaire method, when remove difficulties if any faced by
people are unable to understand respondents in correctly understanding the
questions properly. questions. As a result the information collected
through schedule is relatively more accurate
than that obtained through questionnaires.
10. The success of questionnaire methods It depends upon the honesty and
lies more on the quality of the competence of enumerators
questionnaire itself.
11. The physical appearance of This may not be the case as schedules are to
questionnaire must be quite attractive. be filled in by enumerators and not by
respondents.
12. Observation method is not possible Along with schedule observation method
when collecting data through can also be used.
questionnaire.
SECONDARY DATA:

Secondary data is the data that have been already collected by and readily
available from other sources. Such data are cheaper and more quickly obtainable than
the primary data and also may be available when primary data cannot be obtained at all.

Researcher must be very careful in using secondary data. He must make a


minute scrutiny because it is just possible that the secondary data may be unsuitable or
may be inadequate in the context of the problem which the researcher wants to
study. By way of caution, the researcher, before using secondary data, must see that
they posses following characteristics:

• Reliability of data

The reliability can be tested by finding out such things about the said data: (a)
Who collected the data (b) What were the sources of data? (c) Were data collected by
using proper methods? (d) At what time were they collected? (e) Was there any bias of
the compiler? (f) What level of accuracy was desired? Was it achieved?

• Suitability of data

The data that are suitable for one enquiry may not be necessarily be found
suitable in another enquiry. Hence, if the available data are found to be unsuitable,
they should not be used by the researcher. In this context, researcher must very
carefully scrutinize the definition of various terms and units of collection used at the
time of collecting the data from the primary source originally. Similarly, the object,
scope and nature of the original enquiry must also be studied. If the researcher finds
differences in these, the data will remain unsuitable for the present enquiry and should
not be used.

• Adequacy of data

If the level of accuracy achieved in data is found inadequate for the purpose of
the present enquiry, they will be considered as inadequate and should not be used by
the researcher. The data will be considered inadequate if they are related to an area
which may be either narrow or wider than the area of the present enquiry.
MERITS OF SECONDARY DATA:

• Use of secondary data is very convenient.


• It saves time and finance.

• In some enquiries primary data cannot be collected.

• Reliable secondary data are generally available for many investigations.

DEMERITS OF SECONDARY DATA:

• It is very difficult to find sufficiently accurate secondary data.

• It is very difficult to find secondary data which exactly fulfils the need of
present investigation.

• Extra caution is required to use secondary data.

• These are not available for all types of enquiries.

SELECTION OF APPROPRIATE METHOD FOR


DATA COLLECTION:

As there are many methods for collection of data, it is important that we


should choose the most appropriate according to the situation provided. So the
following factors have to be kept in mind while selecting a particular method:

• Nature, scope and object of enquiry

The nature, the scope as well as the object of the enquiry are very important as
they affect the choice of the method. The method selected should be such that it suits
the type of enquiry that is to be conducted by the researcher. This factor is also
important in deciding whether the data already available (secondary data) are to be used
or the data not yet available (primary data) are to be collected.

• Availability of funds
When a method is chosen, it's important to check whether there is adequate amount of
funds to make it work. If the method is too expensive, it will be very hard to do the
experiment.

• Time factor
Time is an important factor as decided when the experiment has to end. Some
methods take relatively more time, whereas with others the data can be collected in a
comparatively shorter duration. The time at the disposal of the researcher, thus, affects
theselection of the method by which the data are to be collected.

• Precision required

Precision required is yet another important factor to be considered at the time of selecting the method
of collection of data.
SOURCES OF SECONDARY DATA:

The sources of secondary data can be classified as internal sources and external
sources.

Internal Sources of Secondary Data

If available, internal secondary data may be obtained with less time, effort and
money than the external secondary data. In addition, they may also be more pertinent
to the situation at hand since they are from within the organization. The internal sources
include

• Accounting resources

This gives so much information which can be used by the marketing researcher.

They give information about internal factors.

• Sales Force Report

It gives information about the sale of a product. The information provided is


ofoutside the organization.

• Internal Experts

These are people who are heading the various departments. They can give an
idea ofhow a particular thing is working

• Miscellaneous Reports

These are what information you are getting from operational reports.

External Sources of Secondary Data


External Sources are sources which are outside the company in a larger
environment. Collection of external data is more difficult because the data have much
greater variety and the sources are much more numerous. External data can be divided
into following classes:
• Govt. Publications

Government sources provide an extremely rich pool of data for the researchers.
In addition, many of these data are available free of cost on internet websites.
The Central Statistical Organization (CSO) and various state govt. collect,
compile and publish data on regular basis.

• International Bodies

All foreign governments and international agencies publish regular reports of


international significance. These reports are regularly published by the agencies
like; United Nations Organization, International Labour Organization, World
Meteorological Organization etc.

• Semi Govt. Publications

Semi govt. organizations municipalities, District Boards and others also


publish reports in respect of birth, death and education, sanitation and many
other related fields.

• Reports of Committee and Commissions

Central govt. or State govt. sometimes appoints committees and commissions


on matters of great importance. Reports of such committees are of great
significance as they provide invaluable data. These reports are like, Shah
Commission Report, Sarkaria Commission Report and Finance Commission
Reports etc.

• Private Publications

Some commercial and research institutes publish reports regularly. They are like
Institutes of Economic Growth, Stock Exchanges, National Council of Education
Research and Training (NCERT), National Council of Applied Economic
Research (NCAER) etc.

• Newspapers and Magazines


Various newspapers as well as magazines also do collect data in respect of
many social and economic aspects.

• Research Scholars

Individual research scholars collect data to complete their research work


which further is published with their research papers.

SELECTION OF APPROPRIATE METHOD FOR DATA COLLECTION

However, selection of appropriate method of data collection is influenced by several factors as discussed
below:

 The Nature of phenomenon under study: The nature of the phenomenon under study largely
influences the choice of the method of the data collection. Each research phenomenon has its particular
characteristics and, therefore, needs different approaches and methods of data collection. For
example, some of the phenomenon only can be studied appropriately through observation such as
clinical practices or processes in particular nursing procedures. Similarly, knowledge of a group of
nurses only can be accessed through questioning or interviews. Therefore, the nature of the
phenomenon under study significantly affects the selection of particular method of data collection.
 Type of research subjects: Data collection methods are also influenced by the type of subjects under
study. For example, data collection from physically or psychologically disabled subjects can be done
either by interview or through observation, where data collection through questionnaire is not feasible.
On the other hand, if data has to be collected from objects or institutions, questionnaires or interviews
may not be possible at all, and researchers will have to depend mostly on observation to collect
relevant data.
 The type of research study: Quantitative and qualitative research studies need different methods of
data collection. For examples, in qualitative research, more in-depth information is required, therefore,
focused group interviews or unstructured participatory interviews are feasible for data collection, while
for quantitative research studies more structured interviews, questioning, or observation is used for data
collection.
 The purpose of the research study: The purpose of the study also influences the choice of the methods
of data collection, such as in a study conducted with the purpose of the exploration of phenomenon, in-
depth interviews may be needed for data collection, while studies conducted with purpose of
description or correlation of study variables may need more structured methods of data collection.
 Size of the study sample: When a study is conducted on a small sample, interviews or direct
observation may be possible, while these methods can be tedious for large samples. For larger samples,
questionnaires can be better and more referable methods for data collection. Interviews and observation
methods will also be cost-effective and easy for smaller groups, while questionnaires will be
convenient, easier and cost-effective methods of data collection for larger samples.
 Distribution of the target population: If target population is spread in a large geographical area, it will
not be possible to carry out interviews or observation, and therefore, mailed questionnaires may be a
better option, which will be more convenient and cost-effective in such conditions.
 Time frame of the study: If a research is conducted for the long time, it may permit the researcher to
use the less-structured methods of data collection to gain in-depth information, while short time-frame
studies may not allow the researcher to use the unstructured methods of data collection, where he or
she gets very little time for data collection and analysis. Therefore, structured methods of data
collection are used more short-term research designs.
 Literacy level of the subjects: Illiterate subjects put constrains on the use of self-responding methods
of data collection such as questionnaires. for illiterate subjects, interviews conducted in native language
is one of the few possible methods of data collection used, while more varied and numerous options in
methods of data collection are available for literate subjects.
 Availability of resources and manpower: Some of the method of data collection require more
quantities of resources and manpower, such as conducting interviews and observation compared to the
use of questionnaires. Therefore, availability of resources and manpower also affects the selection of
methods of data collection.
 Researcher's knowledge level and competence: The researcher's knowledge and competence also
affects the selection of methods of data collection, for example conducting an interview observation
may require special social and psychological knowledge, skills, and competence, while the use of
questionnaires may not demand these skills, however for the development and construction of a good
questionnaire, good writing skills may be required.

CRITERIA OF EVALUATION/ASSESSMENT OF DATA COLLECTION METHODS


The appropriateness of the data collection method may be evaluated or assessed by using following criteria:
1. Accuracy and completeness of data collection: Researcher must ensure that data collection methods
used will yield accurate and complete data to answer research questions or test hypothesis.
2. Compatibility with educational level, sociocultural values, and beliefs of the subjects.
3. Cost-effectiveness of and speed in data collection procedure.
4. In accordance with nature of phenomenon, type, purpose, time frame, and resources available for the
study.
5. Further, following criteria may considered while evaluating or assessing the method of data collection:

 Is data collection method complete in all the aspects of study and study variables.
 Are data collection methods thoroughly described?
 Are data collection methods in accordance to research questions/hypothesis to be tested?
 Are validity and reliability of data collection methods established?
 Are the numbers of methods used for data collection sufficient for complete coverage of research data
or additional methods required for data collection?
 Are anonymity and confidentiality assured?
 Are instruments described in detail?
 Where the criterion measures or scoring methods clearly established?

REALIBILITY AND VALIDITY OF INSTRUMENT


Reliability refers to the extent that the instrument yields the same results over multiple trials. Validity refers to
the extent that the instrument measures what it was designed to measure.
What Is Validity?
In simple terms, validity (also called “construct validity”) is all about whether a research instrument
accurately measures what it’s supposed to measure.
For example, let’s say you have a set of Likert scales that are supposed to quantify someone’s level of overall
job satisfaction. If this set of scales focused purely on only one dimension of job satisfaction, say pay
satisfaction, this would not be a valid measurement, as it only captures one aspect of the multidimensional
construct. In other words, pay satisfaction alone is only one contributing factor toward overall job
satisfaction, and therefore it’s not a valid way to measure someone’s job satisfaction.
Oftentimes in quantitative studies, the way in which the researcher or survey designer interprets a question or
statement can differ from how the study participants interpret it. Given that respondents don’t have the
opportunity to ask clarifying questions when taking a survey, it’s easy for these sorts of misunderstandings to
crop up. Naturally, if the respondents are interpreting the question in the wrong way, the data they provide
will be pretty useless. Therefore, ensuring that a study’s measurement instruments are valid – in other words,
that they are measuring what they intend to measure – is incredibly important.
There are various types of validity and we’re not going to go down that rabbit hole in this post, but it’s worth
quickly highlighting the importance of making sure that your research instrument is tightly aligned with the
theoretical construct you’re trying to measure. In other words, you need to pay careful attention to how the
key theories within your study define the thing you’re trying to measure – and then make sure that your survey
presents it in the same way.
For example, sticking with the “job satisfaction” construct we looked at earlier, you’d need to clearly define
what you mean by job satisfaction within your study (and this definition would of course need to be
underpinned by the relevant theory). You’d then need to make sure that your chosen definition is reflected in
the types of questions or scales you’re using in your survey. Simply put, you need to make sure that your
survey respondents are perceiving your key constructs in the same way you are. Or, even if they’re not, that
your measurement instrument is capturing the necessary information that reflects your definition of the
construct at hand.

What Is Reliability?
As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight
scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is
measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability.
In other words, reliability reflects the degree to which a measurement instrument produces consistent results
when applied repeatedly to the same phenomenon, under the same conditions. As you can probably imagine,
a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable)
than one that doesn’t – in other words, it can be trusted to provide consistent measurements. And that, of
course, is what you want when undertaking empirical research. If you think about it within a more domestic
context, just imagine if you found that your bathroom scale gave you a different number every time you
hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your
body weight. It’s worth mentioning that reliability also extends to the person using the measurement
instrument. For example, if two researchers use the same instrument (let’s say a measuring tape) and they get
different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring
tape. So, when you think about reliability, consider both the instrument and the researcher as part of the
equation. As with validity, there are various types of reliability and various tests that can be used to assess
the reliability of an instrument. A popular one that you’ll likely come across for survey instruments
is Cronbach’s alpha, which is a statistical measure that quantifies the degree to which items within an
instrument (for example, a set of Likert scales) measure the same underlying construct. In other words,
Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same
concept. Cronbach's alpha is a measure of internal consistency, that is, how closely related a set of items are as
a group. It is considered to be a measure of scale reliability. A “high” value for alpha does not imply that the
measure is one-dimensional.
 Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed
to measure
 Reliability is concerned with whether that measurement is consistent and stable when measuring the same
phenomenon under the same conditions.

In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-
quality, accurate data that help you answer your research questions.
MAHARISHI MARKANDESHWAR (DEEMED TO BE) UNIVERSITY
MULLANA (AMBALA)
COURSE: RESEARCH METHODOLOGY AND QUANTITATIVE METHODS

Page 43
Dr. Deepa Sharma, MM University, India

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy