2. Collection of Data
2. Collection of Data
(Statistics)
Collection of Data
Definition -
Data collection is defined as the procedure of collecting, measuring and analyzing accurate
insights for research using standard validated techniques. A researcher can evaluate their
hypothesis on the basis of collected data. In most cases, data collection is the primary and
most important step for research, irrespective of the field of research. The approach of data
collection is different for different fields of study, depending on the required information.
Data is a tool which helps in reaching a sound conclusion by providing information therefore.
For statistical investigation, collection of data is the first and foremost.
Sources of Data:
Primary Source
Secondary Sources
Published sources
Un-published sources
Primary Data: Data originally collected in the process of investigation are known as
primary data. This is original form of data which are collected for the first time.It is
collected directly from its source of origin.
Secondary data It refers to collection of data by some agency, which already collected the data
and processed. The data thus collected is called secondary data.
Accuracy,
Originality,
Cost,
Need of modification
Meaning Primary data refer to the first hand Secondary data means data
data gathered by the researcher collected by someone else earlier.
himself.
Specific Always specific to the researcher’s May or may not be specific to the
needs. researcher’s need.
Published sources:
Govt. publication.
semi-Govt. Publication.
ECONOMICS Collection of Data
Private publications e.g., Journals and News papers research institute, publication of trade
association.
International publications.
Unpublished Sources -
The statistical data needn’t always be published. There are various sources of unpublished
statistical material such as the records maintained by private firms, business enterprises,
scholars, research workers, etc. They may not like to release their data to any outside
agency.
Important Questions
(a) equal
(b) unequal
(c) zero
(d) none of these
7. Which of the following methods is used for the estimation of population in a country?
(a) Census method
(b) Sampling method
(c) Both fa) and (b)
(d) None of these
8. Personal bias is possible under:
(a) random sampling
(b) purposive sampling
(c) stratified sampling
(d) quota sampling
9. If the investigator wants to select a sample on the basis of diverse characteristics of
the population, which method should he use?
(a) Convenience sampling method
(b) Quota sampling method
(c) Stratified sampling method
(d) Both (b) and (c)
10.For drawing lottery _________________ sampling is used.
(a) random
(b) purposive
(c) stratified
(d) quota
11.Which of the following methods is used for the estimation of population in country`
(a) Sampling Method
(b) Census Method
(c) Both (a) and (b)
(d) Neither (a) nor (b)
ECONOMICS Collection of Data
12.What kind of data are contained in the census of population and national income
estimates, for the government?
(a) Primary data
(b) Secondary data
(c) Internal data
(d) None of these
13.The data collected on the height of a group of students after recording their heights
with a measuring tape are:
(a) Primary data
(b) Continuous data
(c) Discrete data
(d) Secondary data
14.Which of the following is a method of secondary data collection?
(a) Direct personal investigation
(b) Direct oral investigation
(c) Collection of information through questionnaire
(d) None of these
15.In random sampling:
(a) Each element has equal chance of being selected
(b) Sample is always full of bias
(c) Cost involved is very less
(d) Cost involved is high
Question 7 The progress report of a railway published by the railway department is what
kind of data?
Question 8 When is a direct personal investigation suitable for primary data collection?
Question 9 When are the qualities of a good Questionnaire?
Question 10 Why is a pilot survey important?
Question 11 What is the universe in statistics?
Question 12 Define sample.
Question 13 Define the census method.
Question 14 Explain the sample method.
Question 15 What do you mean by random sampling?
Question 16 What is purposive or deliberate sampling?
Question 17 Define stratified and mixed sampling?
Question 18 Explain systematic sampling.
Question 19 What is quota sampling?
Question 20 What is convenience sampling?
ANSWER KEY
Multiple Choice Answers-
1. B
2. A
ECONOMICS Collection of Data
3. C
4. D
5. D
6. A
7. A
8. B
9. D
10. A
11. B
12. B
13. A
14. D
15. A
14.Answer: It is a process of collecting data in which the sample of a group of items are
examined, and conclusions are drawn on their basis.
15.Answer: In this method, every item of the universe has an equal chance of being
selected in the sample.
16.Answer: It is a sampling method where the investor chooses the sampling items
according to his opinion, and it is the best for the population.
17.Answer: In this method, the universe is divided into two groups having different
characteristics, and the items are selected for each group, hence the entire group is
represented.
18.Answer: In systematic sampling, population units are arranged according to the
alphabets, numbers, and geography. Here, every nth numerical item is selected as a
sample.
19.Answer: Here, the universe is divided into two sections or groups in terms of their
characteristics.
20.Answer: In this method, sampling is done according to the investigator’s convenience.
Ans. 2 Secondary Source of collection of data implies obtaining the relevant statistical
information from an agency, or an institution which is already in possession of that
information. To continue with the previous example, data relating to the quality of life of
the people of your town (or the data on per capita expenditure) may have already been
collected by the State Government. You can simply approach the concerned Government
department and request for the desired information. This will be a Secondary Source of
data for you. Thus, secondary source implies that the desired statistical information
already exists and you are simply to collect it from the concerned agency or the
department. You are not to conduct statistical survey(s) yourself and you are not to
contact the respondents (people offering basic information). OT course, you are not
ECONOMICS Collection of Data
getting first hand information relating to your statistical study. You are simply relying on
the information which is already existing.
Secondary source of data implies collection of data from some agency or institution which
already happens to have collected the data through statistical survey(s). It does not offer
you first-hand information relating to your statistical study. You are to rely on the
information which is already existing.
Ans. 3 The following are some principal differences between primary and secondary data:
(1) Difference in Originality: Primary data are original because these are collected by the
investigator from the source of their origin. Against this, secondary data are already in
existence and therefore, are not original.
(2) Difference in Objective: Primary data are always related to a specific objective of the
investigator. These data, therefore, do not need any adjustment for the concerned study.
On the other hand, secondary data have already been collected for some other purpose.
Therefore, these data need to be adjusted to suit the objective of study in hand.
(3) Difference in Cost of Collection: Primary data are costlier in terms of time, money and
efforts involved than the secondary data. This is because primary data are collected for
the first time from their source of origin. Secondary data are simply collected from the
published or unpublished reports. Accordingly, these are much less expensive.
Of course, it may be noted that, there are no fundamental differences between primary
data and secondary data. Data are data, whether primary or secondary. These are
classified as primary or secondary just on the basis of their collection: first-hand or
second-hand. Thus, a particular set of data when collected by the investigator for a
specific purpose from the source of origin, would be primary data. And the same set of
data, when used by some other investigator for his own purpose, would be known as
secondary data. Thus, Secrist has rightly pointed out, “The distinction between primary
and secondary data is one of the degree. Data which are primary in the hands of one party
may be secondary in the hands of other.’’
Primary and Secondary Data—The Basic Difference
If we are collecting data from its source of origin, for the first time, it is primary
data.
If we are using data which have already been collected by somebody else, it is
secondary data.
Note: If you are getting data from somebody else who collected it from its source of origin
but did not use it for his own study, it will be deemed as primary data.
Ans. 4 The direct personal investigation is the method by which data are personally
collected by the investigator from the informants. In other words, the investigator
establishes direct relation with the persons from whom the information is to be obtained.
ECONOMICS Collection of Data
The success of this method, however, requires that the investigator should be very
diligent, efficient, impartial and tolerant.
Direct contact with the workers of an industry to obtain information about their economic
conditions is an example of this method.
Suitability
This method of collecting primary data is suitable particularly when:
(i) the field of investigation is limited or not very large.
(ii) a greater degree of originality of the data is required.
(iii) information is to be kept secret.
(iv) accuracy of data is of great significance, and
(v) when direct contact with the informants is required.
Merits
Data, thus, collected have the following merits:
(i) Originality: Data have a high degree of originality.
(ii) Accuracy: Data are fairly accurate when personally collected.
(iii) Reliability: Because the information is collected by the investigator himself, reliability
of the data is not doubted.
(iv) Related Information: When in direct contact with the informants, the investigator may
obtain other related information as well.
(v) Uniformity: There is a fair degree of uniformity in the data collected by the investigator
himself from the informants. It facilitates comparison.
(vi) Elastic: This method is fairly elastic because the investigator can always make
necessary adjustments in his set of questions.
Demerits
However, the method of direct personal investigation suffers from certain demerits, as
under:
(i) Difficult to Cover Wide Areas: Direct personal investigation becomes very difficult when
the area of the study is very wide.
(ii) Personal Bias: This method is highly prone to personal bias of the investigator. As a
result, the data may lose their credibility.
(iii) Costly: This method is very expensive in terms of the time, money and efforts
involved.
(iv) Limited Coverage: In this method, area of investigation is generally small. The results
are, therefore, less representative. This may lead to wrong conclusions.
ECONOMICS Collection of Data
Ans. 5 Indirect oral investigation is the method by which information is obtained not from
the persons regarding whom the information is needed. It is collected orally from other
persons who are expected to possess the necessary information, these other persons are
known as witnesses. For example, by this method, the data on the economic conditions of
the workers may be collected from their employers rather than the workers themselves.
Suitability
This method is suitable particularly when:
(i) the field of investigation is relatively large.
(ii) it is not possible to have direct contact with the concerned informants.
(iii) the concerned informants are not capable of giving information because of their
ignorance or illiteracy.
(iv) investigation is so complex in nature that only experts can give information.
This method is mosdy used by government or non-government committees or
commissions.
Merits
Some of the notable merits of this method are as under:
(i) Wide Coverage: This method can be applied even when the field of investigation is very
wide.
(ii) Less Expensive: This is relatively a less expensive method as compared to Direct
Personal Investigation.
(iii) Expert Opinion: Using this method an investigator can seek opinion of the experts and
thereby can make his information more reliable.
(iv) Free from Bias: This method is relatively free from the personal bias of the
investigator.
(v) Simple: This is relatively a simple approach of data collection.
Demerits:
However, there are some demerits, as under:
(i) Less Accurate: The data collected by this method are relatively less accurate. This is
because the information is obtained from persons other than the concerned informants.
(ii) Biased: There is possibility of personal bias of the witnesses giving information.
(iii) Doubtful Conclusions: This method may lead to doubtful conclusions due to
carelessness of the witnesses.
Ans. 6 The difference between direct personal investigation and indirect oral investigation
is as under:
ECONOMICS Collection of Data
(i) In the case of direct personal investigation, the investigator establishes direct contact
with the informants. On the other hand, in the case of indirect oral investigation,
information is obtained by contacting other than those about whom information is
sought.
(ii) Direct Personal Investigation is generally possible when the field of investigation is
small. On the other hand, indirect oral investigation is generally preferred when the field
of investigation is relatively large.
(iii) In the Direct Personal Investigation, the investigator must be well versed in the
language and cultural habits of the informants. There is no such requirement in the case
of Indirect Oral Investigation.
(iv) Direct investigation is relatively costlier than the indirect investigation.
Ans 7. Under this method, the investigator appoints local persons or correspondents at
different places. They collect information in their own way and furnish the same to the
investigator.
Suitability
This method is suitable particularly when:
(i) regular and continuous information is needed.
(ii) the area of investigation is large.
(iii) the information is to be used by journals, magazines, radio, TV, etc. and
(iv) a very high degree of accuracy of information is not required.
Merits
Principal merits of this method are as under:
(i) Economical: This method is quite economical in terms of time, money or efforts
involved.
(ii) Wide Coverage: This method allows a fairly wide coverage of investigation.
(iii) Continuity: The correspondents keep on supplying almost regular information.
(iv) Suitable for Special Purpose: This method is particularly suitable for some
specialpurpose investigations, e.g., price quotations from the different grain markets for
the construction of Index Number of agricultural prices.
Demerits
Following are some notable demerits of this method:
(i) Loss of Originality: Originality of data is sacrificed owing to the lack of personal contact
with the respondents.
(ii) Lack of Uniformity: There is lack of uniformity of data. This is because data is collected
by a number of correspondents.
ECONOMICS Collection of Data
(iii) Personal Bias: This method suffers from the personal bias of the correspondents.
(iv) Less Accurate: The data collected by this method are not very accurate.
(v) Delay in Collection: Generally, there is a delay in the collection of information through
this method.
Ans. 9 Statistical errors are broadly classified as (i) sampling errors, and (ii) non-sampling
errors. Following are the details:
(i) Sampling Errors: These are related to the size or nature of the sample selected for the
study. Due to a very small size of the sample selected for study or due to
nonrepresentative
nature of the sample, the estimated value may differ from the actual value of a
parameter. The error thus emerging, is called sampling error. For example, if the
estimated value of a parameter is found to be 10 while the actual/true value is 20 then,
the sampling error = estimated value of the parameter – true value of the parameter = 10-
20 = -10.
(ii) Non-sampling Errors: These are errors related to the collection of data. These are of
the following types:
Error of Measurement: Error of measurement may occur due to.- (a) difference in the
scale of measurement, and (b) difference in the rounding off procedure adopted by
different investigators.
Error of Non-response: This arises when the respondents do not offer the required
information. Error of Misinterpretation: This arises when the respondent fails to interpret
the questions in the questionnaire.
Error of Calculation or Arithmetical Error: It occurs in the course of addition, subtraction
or multiplication of data.
Error of Sampling Bias: It occurs when, for some reason or the other, a part of target
population, cannot be included in the choice of a sample.
Larger the field of investigation or larger the population size, greater is the possibility of
errors related to the collection of data, or data acquisition. It must be noted here that a
non-sampling error is more serious than a sampling error. Because a sampling error can
be minimised by opting for a larger sample size. No such possibility exists in case of
nonsampling errors.
ECONOMICS Collection of Data