RM Unit-II
RM Unit-II
A population may be studied using one of two approaches: taking a census, or selecting a sample.
It is important to note that whether a census or a sample is used, both provide information that can be used to
draw conclusions about the whole population.
A sample is a subset of units in a population, selected to represent all units in a population of interest. It is a
partial enumeration because it is a count from part of the population.
Information from the sampled units is used to estimate the characteristics for the entire population of interest.
Once a population has been identified a decision needs to be made about whether taking a census or selecting a
sample will be the more suitable option. There are advantages and disadvantages to using a census or sample
to study a population:
if good sampling techniques are used, the often not suitable for producing benchmark
results can be very representative of the data
actual population as data are collected from a subset of units and
inferences made about the whole population,
the data are subject to 'sampling' error
Census is the process of collecting data from every member of a population, while sampling is the process of
collecting data from a subset of a population. In a census, every member of a population is included, while in
sampling, a smaller group of individuals is selected to represent the population as a whole.
Census Sampling
Involves collecting data from every single member Involves collecting data from a subset or a
of a population selected group of the population
Requires a large amount of resources and time to Requires fewer resources and is quicker to
conduct the survey and gather data conduct as it only involves a specific group of the
population
Provides a complete and accurate representation of Provides an estimate or a general idea of the
the population as it covers all the members population based on the sample selected
Can be more expensive than sampling as it involves Is generally less expensive than a census as it only
collecting data from every member of the involves a specific group of the population
population
Can be useful for small populations or when Can be useful for large populations or when a
detailed information is needed about the whole general overview is needed about the population
population
The margin of error is typically very small as it The margin of error is typically larger than a
covers the whole population census as the sample size is smaller than the
population size
Sample: The sample is a part or a small section of the population selected for study.
Sampling: It is the procedure of selecting a sample from the population.
Sampling Frame: A set of information used to identify a sample population for
statistical treatment. A sampling frame includes a numerical identifier for each
individual, plus other identifying information about characteristics of the individuals, to
aid in analysis and allow for division into further frames for more in-depth analysis.
ESSENTIALS OF GOOD SAMPLING:
In order to reach at right conclusions, a sample must possess the following
essential characteristics.
1. Representative: The sample should truly represent the characteristics of the universe. For this
investigator should be free from bias and the method of collection should be appropriate.
2. Adequacy: The size of the sample should be adequate i.e., neither too large nor small but
commensurate with the size of the population.
3. Homogeneity: There should be homogeneity in the nature of all the units selected for the
sample. If the units of the sample are of heterogeneous character it will be impossible to make a
comparative study with them.
4. Independent ability: The method of selection of the sample should be such that the items of
the sample are selected in an independent manner. This means that selection of one item
should not influence the selection of another item in any manner and that each item should
be selected on the basis of its own merit.
NEED OF SAMPLING:
Sampling is necessary because of the following reasons
• It is not technically or economically feasible to take the entire population into
consideration.
• Due to dynamic changes in business, industrial and social environment, it is
necessaryto make quick decision, so sample is necessary to save time.
• If data collection takes a long time, then value of some characteristics may
change over the period of time thus, defeating the very purpose of data analysis.
Thus, due to importance of time element sample is required.
• Sample, if representative, may yield more accurate results than the total census
because sample can be more accurately supervised.
STEPS IN SAMPLING PROCESS:
The steps involved in the sampling procedure are as follows:
Target population refers to the group of individuals or objects to which researchers are interested in
generalizing their findings.
A well-defined population reduces the likelihood of undesirable individuals or objects. A sample is
taken from the target population.
The sampling frame is the group of individuals or objects from which the researcher will draw the
sample.
It is the actual list of all units in a target population from which the sample is taken.
Sampling can be done by two techniques: probability (random selection) or non-probability (non-
random) technique.
Now, if the sampling frame is approximately the same as the target population, random selection may
be used to select samples.
The sample size is defined as the number of units in the sample. Sample size determination depends on
many factors such as time, cost, and facility.
Once population, sampling frame, sampling technique, and sample size are identified, the researcher
can use all that information to execute the sampling plan and collect the data required for the research.
In Statistics, there are different sampling techniques available to get relevant results from the population. The
two different types of sampling methods are::
Probability Sampling
Non-probability Sampling
The probability sampling method utilizes some form of random selection. In this method, all the eligible
individuals have a chance of selecting the sample from the whole sample space. This method is more time
consuming and expensive than the non-probability sampling method. The benefit of using probability
sampling is that it guarantees the sample that should be the representative of the population.
Probability Sampling methods are further classified into different types, such as simple random sampling,
systematic sampling, stratified sampling, and clustered sampling. Let us discuss the different types of
probability sampling methods along with illustrative examples here in detail.
In simple random sampling technique, every item in the population has an equal and likely chance of being
selected in the sample. Since the item selection entirely depends on the chance, this method is known as
“Method of chance Selection”. As the sample size is large, and the item is chosen randomly, it is known as
“Representative Sampling”.
Example:
Suppose we want to select a simple random sample of 200 students from a school. Here, we can assign a
number to every student in the school database from 1 to 500 and use a random number generator to select a
sample of 200 numbers.
Systematic Sampling
In the systematic sampling method, the items are selected from the target population by selecting the random
selection point and selecting the other methods after a fixed sample interval. It is calculated by dividing the
total population size by the desired population size.
Example:
Suppose the names of 300 students of a school are sorted in the reverse alphabetical order. To select a sample
in a systematic sampling method, we have to choose some 15 students by randomly selecting a starting
number, say 5. From number 5 onwards, will select every 15th person from the sorted list. Finally, we can end
up with a sample of some students.
Stratified Sampling
In a stratified sampling method, the total population is divided into smaller groups to complete the sampling
process. The small group is formed based on a few characteristics in the population. After separating the
population into a smaller group, the statisticians randomly select the sample.
For example, there are three bags (A, B and C), each with different balls. Bag A has 50 balls, bag B has 100
balls, and bag C has 200 balls. We have to choose a sample of balls from each bag proportionally. Suppose 5
balls from bag A, 10 balls from bag B and 20 balls from bag C.
Clustered Sampling
In the clustered sampling method, the cluster or group of people are formed from the population set. The group
has similar significatory characteristics. Also, they have an equal chance of being a part of the sample. This
method uses simple random sampling for the cluster of population.
Example:
An educational institution has ten branches across the country with almost the number of students. If we want
to collect some data regarding facilities and other things, we can’t travel to every unit to collect the required
data. Hence, we can use random sampling to select three or four branches as clusters.
All these four methods can be understood in a better manner with the help of the figure given below. The
figure contains various examples of how samples will be taken from the population using different techniques.
The non-probability sampling method is a technique in which the researcher selects the sample based on
subjective judgment rather than the random selection. In this method, not all the members of the population
have a chance to participate in the study.
Non-probability Sampling methods are further classified into different types, such as convenience sampling,
consecutive sampling, quota sampling, judgmental sampling, snowball sampling. Here, let us discuss all these
types of non-probability sampling in detail.
Convenience Sampling
In a convenience sampling method, the samples are selected from the population directly because they are
conveniently available for the researcher. The samples are easy to select, and the researcher did not choose the
sample that outlines the entire population.
Example:
In researching customer support services in a particular region, we ask your few customers to complete a
survey on the products after the purchase. This is a convenient way to collect data. Still, as we only surveyed
customers taking the same product. At the same time, the sample is not representative of all the customers in
that area.
Consecutive Sampling
Consecutive sampling is similar to convenience sampling with a slight variation. The researcher picks a single
person or a group of people for sampling. Then the researcher researches for a period of time to analyze the
result and move to another group if needed.
Quota Sampling
In the quota sampling method, the researcher forms a sample that involves the individuals to represent
the population based on specific traits or qualities. The researcher chooses the sample subsets that bring the
useful collection of data that generalizes the entire population.
In purposive sampling, the samples are selected only based on the researcher’s knowledge. As their knowledge
is instrumental in creating the samples, there are the chances of obtaining highly accurate answers with a
minimum marginal error. It is also known as judgmental sampling or authoritative sampling.
Snowball Sampling
Snowball sampling is also known as a chain-referral sampling technique. In this method, the samples have
traits that are difficult to find. So, each identified member of a population is asked to find the other sampling
units. Those sampling units also belong to the same targeted population.
The below table shows a few differences between probability sampling methods and non-probability sampling
methods.
The key difference between sampling and non-sampling error is that sampling error is the error
that arises from taking a sample from a larger population, while non-sampling error is error that
arises from other sources, such as errors in data collection or data entry. Sampling error is the
deviation of the selected sample from the true characteristics, traits, behaviours, qualities or
figures of the entire population. Various forces combine to produce deviations of sample
statistic from population parameters, and errors, in accordance with the different cause, are
classified into sampling and non-sampling error.
What are Sampling Errors?
Sampling errors are statistical errors that arise when a sample does not represent the whole population. They
are the difference between the real values of the population and the values derived by using samples from the
population. Sampling errors occur when numerical parameters of an entire population are derived from a
sample of the entire population. Since the whole population is not included in the sample, the parameters
derived from the sample differ from those of the actual population.
Sampling errors are deviations in the sampled values from the values of the true population emanating from
the fact that a sample is not an actual representative of a population of data.
Since there is a fault in the data collection, the results obtained from sampling become invalid. Furthermore,
when a sample is selected randomly, or the selection is based on bias, it fails to denote the whole population,
and sampling errors will certainly occur.
They can be prevented if the analysts select subsets or samples of data to represent the whole population
effectively. Sampling errors are affected by factors such as the size and design of the sample,
population variability, and sampling fraction.
Increasing the size of samples can eliminate sampling errors. However, to reduce them by half, the sample size
needs to be increased by four times. If the selected samples are small and do not adequately represent the
whole data, the analysts can select a greater number of samples for satisfactory representation.
The population variability causes variations in the estimates derived from different samples, leading to larger
errors. The effect of population variability can be reduced by increasing the size of the samples so that these
can more effectively represent the population.
Moreover, sampling errors must be considered when publishing survey results so that the accuracy of the
estimates and the related interpretations can be established.
Practical Example
Suppose the producers of Company XYZ want to determine the viewership of a local program that airs twice a
week. The producers will need to determine the samples that can represent various types of viewers. They may
need to consider factors like age, level of education, and gender.
For example, people between the ages of 14 and 18 usually have fewer commitments, and most of them can
spare time to watch the program twice weekly. On the contrary, people between the age of 18 and 35 usually
have tighter schedules and will not have time to watch TV.
Hence, it is important to draw a sample proportionately. Otherwise, the results will not represent the real
population.
Since the exact population parameter is not known, sampling errors for samples are generally unknown.
However, analysts can use analytical methods to measure the amount of variation caused by sampling errors.
Categories or Reasons of Sampling Errors
Population Specification Error – Happens when the analysts do not understand who to survey. For
example, for a survey of breakfast cereals, the population can be the mother, children, or the entire
family.
Selection Error – Occurs when the respondents’ survey participation is self-selected, implying only
those who are interested respond. Selection errors can be reduced by encouraging participation.
Sample Frame Error – Occurs when a sample is selected from the wrong population data.
Non-Response Error – Occurs when a useful response is not obtained from the surveys. It may
happen due to the inability to contact potential respondents or their refusal to respond.
NON-SAMPLING ERROR
Non-sampling error refers to all sources of error that are unrelated to sampling. Non-sampling errors are
present in all types of survey, including censuses and administrative data. They arise for a number of reasons:
the frame may be incomplete, some respondents may not accurately report data, data may be missing for some
respondents, etc.
Non-sampling errors can be classified into two groups: random errors and systematic errors.
Random errors are errors whose effects approximately cancel out if a large enough sample is used,
leading to increased variability.
Systematic errors are errors that tend to go in the same direction, and thus accumulate over the entire
sample leading to a bias in the final results. Unlike random errors, this bias is not reduced by increasing
the sample size. Systematic errors are the principal cause of concern in terms of a survey’s data quality.
Unfortunately, non-sampling errors are often extremely difficult, if not impossible, to measure.
Non-sampling error can occur in all aspects of the survey process, and can be classified into the following
categories: coverage error, measurement error, non-response error and processing error.
Coverage error
Measurement error
Measurement error, also called response error, is the difference between measured values and true values. It
consists of bias and variance, and it results when data are incorrectly requested, provided, received or
recorded. These errors may occur because of inefficiencies with the questionnaire, the interviewer, the
respondent or the survey process.
Poor questionnaire design: It is essential that sample survey or census questions are worded carefully
in order to avoid introducing bias. If questions are misleading or confusing, then the responses may end
up being distorted.
Interviewer bias: An interviewer can influence how a respondent answers the survey questions. This
may occur when the interviewer is too friendly or aloof or prompts the respondent. To prevent this,
interviewers must be trained to remain neutral throughout the interview. They must also pay close
attention to the way they ask each question. If an interviewer changes the way a question is worded, it
may impact the respondent’s answer.
Respondent error: Respondents can also provide incorrect answers. Faulty recollections, tendencies to
exaggerate or underplay events, and inclinations to give answers that appear more socially acceptable
are several reasons why a respondent may provide a false answer.
Problems with the survey process: Errors can also occur because of a problem with the actual survey
process. Using proxy responses, meaning taking answers from someone other than the respondent, or
lacking control over the survey procedures are just a few ways of increasing the risk of response errors.
Non-response error
Estimates obtained after nonresponse has been observed and imputation has been used to deal with this
nonresponse are usually not equivalent to the estimates that would have been obtained had all the desired
values been observed without error. The difference between these two types of estimates is called
the nonresponse error. There are two types of non-response errors: total and partial.
Total nonresponse error occurs when all or almost all data for a sampling unit are missing. This can
happen if the respondent is unavailable or temporarily absent, the respondent is unable to participate or
refuses to participate in the survey, or if the dwelling is vacant. If a significant number of sampled units
do not respond to a survey, then the results may be biased since the characteristics of the non-
respondents may differ from those who have participated.
Partial nonresponse error occurs when respondents provide incomplete information. For certain
people, some questions may be difficult to understand, they may refuse or forget to answer a question.
Poorly designed questionnaire or poor interviewing techniques can also be reasons which result partial
nonresponse error. To reduce this form of error, care should be taken in designing and testing
questionnaires. Adequate interviewer training and appropriate edit and imputation strategies will also
help minimize this error.
Processing error
Processing error occurs during data processing. It includes all data processing activities after collection and
prior to estimation, such as errors in data capture, coding, editing and tabulation of the data as well as in the
assignment of survey weights.
Coding errors occur when different coders code the same answer differently, which can be caused by
poor training, incomplete instructions, variance in coder performance (i.e. tiredness, illness), data entry
errors, or machine malfunction (some processing errors are caused by errors in the computer
programs).
Data capture errors result when data are not entered into the computer exactly as they appear on the
questionnaire. This can be caused by the complexity of alphanumeric data and by the lack of clarity in
the answer provided. The physical layout of the questionnaire itself or the coding documents can cause
data capture errors. The method of data capture, manual or automated (for example, using an optical
scanner), can also result in errors.
Editing and imputation errors can be caused by the poor quality of the original data or by its
complex structure. When the editing and imputation processes are automated, errors can also be the
result of faulty programs that were insufficiently tested. The choice of an inappropriate imputation
method can introduce bias. Errors can also result from incorrectly changing data that were found to be
in error, or by erroneously changing correct data.
‘Sample size’ is a market research term used for defining the number of individuals included in conducting
research. Researchers choose their sample based on demographics, such as age, gender questions, or physical
location. It can be vague or specific.
For example, you may want to know what people within the 18-25 age range think of your product. Or, you
may only require your sample to live in the United States, giving you a wide population range. The total
number of individuals in a particular sample is the sample size.
Sample size determination is the process of choosing the right number of observations or people from a larger
group to use in a sample. The goal of figuring out the sample size is to ensure that the sample is big enough to
give statistically valid results and accurate estimates of population parameters but small enough to be
manageable and cost-effective.
In many research studies, getting information from every member of the population of interest is not possible
or useful. Instead, researchers choose a sample of people or events that is representative of the whole to study.
How accurate and precise the results are can depend a lot on the size of the sample.
Choosing the statistically significant sample size depends on a number of things, such as the size of the
population, how precise you want your estimates to be, how confident you want to be in the results, how
different the population is likely to be, and how much money and time you have for the study. Statistics are
often used to figure out how big a sample should be for a certain type of study and research question.
Figuring out the sample size is important in ensuring that research findings and conclusions are valid and
reliable.
Hypothetically, you choose the population of New York, which is 8.49 million. You use a sample size
determination formula to select a sample of 500 individuals that fit into the consumer panel requirement. You
can use the responses to help you determine how your audience will react to the new product.
However, determining a sample size requires more than just throwing your survey at as many people as
possible. If your estimated sample sizes are too big, it could waste resources, time, and money. A sample size
that’s too small doesn’t allow you to gain maximum insights, leading to inconclusive results.
Before we jump into sample size determination, let’s take a look at the terms you should know:
1. Population size:
Population size is how many people fit your demographic. For example, you want to get information on
doctors residing in North America. Your population size is the total number of doctors in North
America. Don’t worry! Your population size doesn’t always have to be that big. Smaller population sizes can
still give you accurate results as long as you know who you’re trying to represent.
2. Confidence level:
The confidence level tells you how sure you can be that your data is accurate. It is expressed as a percentage
and aligned to the confidence interval. For example, if your confidence level is 90%, your results will most
likely be 90% accurate.
There’s no way to be 100% accurate when it comes to surveys. Confidence intervals tell you how far off from
the population means you’re willing to allow your data to fall.
A margin of error describes how close you can reasonably expect a survey result to fall relative to the real
population value. Remember, if you need help with this information, use our margin of error calculator.
4. Standard deviation:
Standard deviation is the measure of the dispersion of a data set from its mean. It measures the absolute
variability of a distribution. The higher the dispersion or variability, the greater the standard deviation and the
greater the magnitude of the deviation.
For example, you have already sent out your survey. How much variance do you expect in your responses?
That variation in response is the standard deviation.
When you survey a large population of respondents, you’re interested in the entire group, but it’s not
realistically possible to get answers or results from absolutely everyone. So you take a random sample of
individuals which represents the population as a whole. The size of the sample is very important for getting
accurate, statistically significant results and running your study successfully.
If your sample is too small, you may include a disproportionate number of individuals which are
outliers and anomalies. These skew the results and you don’t get a fair picture of the whole population.
If the sample is too big, the whole study becomes complex, expensive and time-consuming to run, and
although the results are more accurate, the benefits don’t outweigh the costs.
With all the necessary terms defined, it’s time to learn how to determine sample size using a sample
calculation formula.
Your confidence level corresponds to a Z-score. This is a constant value needed for this equation. Here are the
z-scores for the most common confidence levels:
If you choose a different confidence level, various online tools can help you find your score.
Here is an example of how the math works, assuming you chose a 90% confidence level, .6 standard deviation,
and a margin of error (confidence interval) of +/- 4%.
.9648 / .0016
=603
603 respondents are needed, and that becomes your sample size.
INTRODUCTION:
Once the researcher has decided the research design, the next job is of data collection. Data
collection starts with determining what kind of data required followed by the selection of a
sample from a certain population. The reliability of managerial decisions depends on the quality
of data. The quality of data can be expressed in terms of its representative feature of the
reality which can be ensured by the usage of a fitting data collection methods. Depending upon
the source utilized, whether the data has come from actual observations or from records that are
kept for normal purposes, statistical data can be classified into two categories i.e primary and
secondary:
PRIMARY DATA:
Primary data is a type of information that is obtained directly from first-hand sources by
means of surveys, observation or experimentation. It is data that has not been previously
published and is derived from a new or original research study and collected at the source
such as in marketing.
• The primary data are original and relevant to the topic of the research study so
the degree of accuracy is very high.
• Primary data is that it can be collected from a number of ways like interviews,
telephone surveys, focus groups etc. It can be also collected across the
national borders through emails and posts. It can include a large population and
wide geographical coverage.
• Moreover, primary data is current and it can better give a realistic view to the
researcher about the topic under consideration.
• Reliability of primary data is very high because these are collected by the
concerned and reliable party.
• A lot of time and efforts are required for data collection. By the time the data
collected, analyzed and report is ready the problem of the research becomes
very serious or out dated. So the purpose of the research may be defeated.
• It has design problems like how to design the surveys. The questions must be
simpleto understand and respond.
• In some primary data collection methods there is no control over the data
collection. Incomplete questionnaire always give a negative impact on research.
Observation Method
In the observation method, the investigator will collect data through personal observations. In
this method the investigator will observe the behaviour of the respondents in disguise. Under
observation method information is sought by way of investigators own direct observation
without asking from the respondents.
Advantages of Observation
1. Very direct method for collecting data or information – best for the study of
human behaviour.
6. Observation is less demanding in nature, which makes it less bias in working abilities.
Disadvantages of Observation
Questionnaire Method
Advantages of Questionnaires
• It is free from the bias of the interviewer; answers are in respondents own words.
• There is low cost even when the universe is large and is widely spread geographically.
• Respondents, who are not easily approachable, can also be reachable conveniently.
• Large samples can be made use of and thus the results can be made more
dependable and reliable.
Disadvantage of Questionnaire
Interview Method
In this method the interviewer personally meets the informants and asks
necessary questions to them regarding the subject of enquiry. Usually a set of questions
or a questionnaire is carried by him and questions are also asked according to that.
The interviewer efficiently collects the data from the informants by cross examining
them. The interviewer must be very efficient and tactful to get the accurate and relevant
data from the informants. Interviews like personal interview/depth interview or
telephone interview can be conducted as per the need of the study.
This method acts as a very vital tool for the collection of the data in the social
research as it is all about the direct systematic conversation between an interviewer and
the respondent. By this the interviewer is able to get relevant information for a
particular research problem.
Personal interviews
Under this method there is face to face contact between the interviewer and the
interviewee. This sort of interview may be in the form of direct personal investigation or
it
may be indirect oral investigation. In the case of direct personal investigation the
interviewer has to collect the information personally from the sources concerned. He
has to be on the spot and has to meet people from whom data have to be collected. This
method is particularly suitable for intensive investigations.
Telephone interview
• Very good technique for getting the information about the complex,
emotionally laden subjects.
Mail Survey
This method of data collection is very useful in extensive enquiries and can lead to fairly
reliable results. It is, however, very expensive and is usually adopted in investigations
conducted by governmental agencies or by some big organizations.
S. No Questionnaire Schedule
1. Questionnaire is generally sent through mail to A schedule is generally filled by the
informants to be answered as specified in a research worker or enumerator,
covering letter, but otherwise without further who can interpret the questions
assistance from the sender. when necessary.
2. Data collection is cheap and economical as the Data collection is more expensive
money is spent in preparation of questionnaire as money is spent on enumerators
and in mailing the same to respondents. and in imparting trainings to
them. Money is also spent in
preparing schedules.
3. Non response is usually high as many people Non response is very low because
do not respond and many return the questionnaire this is filled by enumerators who
without answering all questions. Bias due to non- are able to get answers to all
response often remains indeterminate. questions. But even in this, there
remains the danger of interviewer
bias and cheating.
4. It is not clear that who replies. Identity of respondent is known.
5. The questionnaire method is Information is collected well in time as they
likely to be very slow since many are filled by enumerators.
respondents do not return the
questionnaire.
6. No personal contact is possible in Direct personal contact is established
case of questionnaire as the
questionnaires are sent to respondents
by post who also in turn returns the
same by post.
7. This method can be used only when The information can be gathered even when
respondents are literate and the respondents happen to be illiterate.
cooperative.
8. Wider and more representative There remains the difficulty in sending
distribution of sample is possible. enumerators over a relatively wider area.
9. Risk of collecting incomplete and The information collected is generally
wrong information is relatively more complete and accurate as enumerators can
under the questionnaire method, when remove difficulties if any faced by
people are unable to understand respondents in correctly understanding the
questions properly. questions. As a result the information collected
through schedule is relatively more accurate
than that obtained through questionnaires.
10. The success of questionnaire methods It depends upon the honesty and
lies more on the quality of the competence of enumerators
questionnaire itself.
11. The physical appearance of This may not be the case as schedules are to
questionnaire must be quite attractive. be filled in by enumerators and not by
respondents.
12. Observation method is not possible Along with schedule observation method
when collecting data through can also be used.
questionnaire.
SECONDARY DATA:
Secondary data is the data that have been already collected by and readily
available from other sources. Such data are cheaper and more quickly obtainable than
the primary data and also may be available when primary data cannot be obtained at all.
• Reliability of data
The reliability can be tested by finding out such things about the said data: (a)
Who collected the data (b) What were the sources of data? (c) Were data collected by
using proper methods? (d) At what time were they collected? (e) Was there any bias of
the compiler? (f) What level of accuracy was desired? Was it achieved?
• Suitability of data
The data that are suitable for one enquiry may not be necessarily be found
suitable in another enquiry. Hence, if the available data are found to be unsuitable,
they should not be used by the researcher. In this context, researcher must very
carefully scrutinize the definition of various terms and units of collection used at the
time of collecting the data from the primary source originally. Similarly, the object,
scope and nature of the original enquiry must also be studied. If the researcher finds
differences in these, the data will remain unsuitable for the present enquiry and should
not be used.
• Adequacy of data
If the level of accuracy achieved in data is found inadequate for the purpose of
the present enquiry, they will be considered as inadequate and should not be used by
the researcher. The data will be considered inadequate if they are related to an area
which may be either narrow or wider than the area of the present enquiry.
MERITS OF SECONDARY DATA:
• It is very difficult to find secondary data which exactly fulfils the need of
present investigation.
The nature, the scope as well as the object of the enquiry are very important as
they affect the choice of the method. The method selected should be such that it suits
the type of enquiry that is to be conducted by the researcher. This factor is also
important in deciding whether the data already available (secondary data) are to be used
or the data not yet available (primary data) are to be collected.
• Availability of funds
When a method is chosen, it's important to check whether there is adequate amount of
funds to make it work. If the method is too expensive, it will be very hard to do the
experiment.
• Time factor
Time is an important factor as decided when the experiment has to end. Some
methods take relatively more time, whereas with others the data can be collected in a
comparatively shorter duration. The time at the disposal of the researcher, thus, affects
theselection of the method by which the data are to be collected.
• Precision required
Precision required is yet another important factor to be considered at the time of selecting the method
of collection of data.
SOURCES OF SECONDARY DATA:
The sources of secondary data can be classified as internal sources and external
sources.
If available, internal secondary data may be obtained with less time, effort and
money than the external secondary data. In addition, they may also be more pertinent
to the situation at hand since they are from within the organization. The internal sources
include
• Accounting resources
This gives so much information which can be used by the marketing researcher.
• Internal Experts
These are people who are heading the various departments. They can give an
idea ofhow a particular thing is working
• Miscellaneous Reports
These are what information you are getting from operational reports.
Government sources provide an extremely rich pool of data for the researchers.
In addition, many of these data are available free of cost on internet websites.
The Central Statistical Organization (CSO) and various state govt. collect,
compile and publish data on regular basis.
• International Bodies
• Private Publications
Some commercial and research institutes publish reports regularly. They are like
Institutes of Economic Growth, Stock Exchanges, National Council of Education
Research and Training (NCERT), National Council of Applied Economic
Research (NCAER) etc.
• Research Scholars
However, selection of appropriate method of data collection is influenced by several factors as discussed
below:
The Nature of phenomenon under study: The nature of the phenomenon under study largely
influences the choice of the method of the data collection. Each research phenomenon has its particular
characteristics and, therefore, needs different approaches and methods of data collection. For
example, some of the phenomenon only can be studied appropriately through observation such as
clinical practices or processes in particular nursing procedures. Similarly, knowledge of a group of
nurses only can be accessed through questioning or interviews. Therefore, the nature of the
phenomenon under study significantly affects the selection of particular method of data collection.
Type of research subjects: Data collection methods are also influenced by the type of subjects under
study. For example, data collection from physically or psychologically disabled subjects can be done
either by interview or through observation, where data collection through questionnaire is not feasible.
On the other hand, if data has to be collected from objects or institutions, questionnaires or interviews
may not be possible at all, and researchers will have to depend mostly on observation to collect
relevant data.
The type of research study: Quantitative and qualitative research studies need different methods of
data collection. For examples, in qualitative research, more in-depth information is required, therefore,
focused group interviews or unstructured participatory interviews are feasible for data collection, while
for quantitative research studies more structured interviews, questioning, or observation is used for data
collection.
The purpose of the research study: The purpose of the study also influences the choice of the methods
of data collection, such as in a study conducted with the purpose of the exploration of phenomenon, in-
depth interviews may be needed for data collection, while studies conducted with purpose of
description or correlation of study variables may need more structured methods of data collection.
Size of the study sample: When a study is conducted on a small sample, interviews or direct
observation may be possible, while these methods can be tedious for large samples. For larger samples,
questionnaires can be better and more referable methods for data collection. Interviews and observation
methods will also be cost-effective and easy for smaller groups, while questionnaires will be
convenient, easier and cost-effective methods of data collection for larger samples.
Distribution of the target population: If target population is spread in a large geographical area, it will
not be possible to carry out interviews or observation, and therefore, mailed questionnaires may be a
better option, which will be more convenient and cost-effective in such conditions.
Time frame of the study: If a research is conducted for the long time, it may permit the researcher to
use the less-structured methods of data collection to gain in-depth information, while short time-frame
studies may not allow the researcher to use the unstructured methods of data collection, where he or
she gets very little time for data collection and analysis. Therefore, structured methods of data
collection are used more short-term research designs.
Literacy level of the subjects: Illiterate subjects put constrains on the use of self-responding methods
of data collection such as questionnaires. for illiterate subjects, interviews conducted in native language
is one of the few possible methods of data collection used, while more varied and numerous options in
methods of data collection are available for literate subjects.
Availability of resources and manpower: Some of the method of data collection require more
quantities of resources and manpower, such as conducting interviews and observation compared to the
use of questionnaires. Therefore, availability of resources and manpower also affects the selection of
methods of data collection.
Researcher's knowledge level and competence: The researcher's knowledge and competence also
affects the selection of methods of data collection, for example conducting an interview observation
may require special social and psychological knowledge, skills, and competence, while the use of
questionnaires may not demand these skills, however for the development and construction of a good
questionnaire, good writing skills may be required.
Is data collection method complete in all the aspects of study and study variables.
Are data collection methods thoroughly described?
Are data collection methods in accordance to research questions/hypothesis to be tested?
Are validity and reliability of data collection methods established?
Are the numbers of methods used for data collection sufficient for complete coverage of research data
or additional methods required for data collection?
Are anonymity and confidentiality assured?
Are instruments described in detail?
Where the criterion measures or scoring methods clearly established?
What Is Reliability?
As with validity, reliability is an attribute of a measurement instrument – for example, a survey, a weight
scale or even a blood pressure monitor. But while validity is concerned with whether the instrument is
measuring the “thing” it’s supposed to be measuring, reliability is concerned with consistency and stability.
In other words, reliability reflects the degree to which a measurement instrument produces consistent results
when applied repeatedly to the same phenomenon, under the same conditions. As you can probably imagine,
a measurement instrument that achieves a high level of consistency is naturally more dependable (or reliable)
than one that doesn’t – in other words, it can be trusted to provide consistent measurements. And that, of
course, is what you want when undertaking empirical research. If you think about it within a more domestic
context, just imagine if you found that your bathroom scale gave you a different number every time you
hopped on and off of it – you wouldn’t feel too confident in its ability to measure the variable that is your
body weight. It’s worth mentioning that reliability also extends to the person using the measurement
instrument. For example, if two researchers use the same instrument (let’s say a measuring tape) and they get
different measurements, there’s likely an issue in terms of how one (or both) of them are using the measuring
tape. So, when you think about reliability, consider both the instrument and the researcher as part of the
equation. As with validity, there are various types of reliability and various tests that can be used to assess
the reliability of an instrument. A popular one that you’ll likely come across for survey instruments
is Cronbach’s alpha, which is a statistical measure that quantifies the degree to which items within an
instrument (for example, a set of Likert scales) measure the same underlying construct. In other words,
Cronbach’s alpha indicates how closely related the items are and whether they consistently capture the same
concept. Cronbach's alpha is a measure of internal consistency, that is, how closely related a set of items are as
a group. It is considered to be a measure of scale reliability. A “high” value for alpha does not imply that the
measure is one-dimensional.
Validity is concerned with whether an instrument (e.g., a set of Likert scales) is measuring what it’s supposed
to measure
Reliability is concerned with whether that measurement is consistent and stable when measuring the same
phenomenon under the same conditions.
In short, validity and reliability are both essential to ensuring that your data collection efforts deliver high-
quality, accurate data that help you answer your research questions.
MAHARISHI MARKANDESHWAR (DEEMED TO BE) UNIVERSITY
MULLANA (AMBALA)
COURSE: RESEARCH METHODOLOGY AND QUANTITATIVE METHODS
Page 43
Dr. Deepa Sharma, MM University, India